Dataflow pipelien for Bigquery to Bigtable load

Hi @ms4446 

Hope you are doing good. 

Here, we are trying to load BigQuery view data into Bigtable. 

1. We tried to load data from BigQuery view data using Flex template. But, unfortunately it is working only with BigQuery tables not with the view.

2. We created custo code using Apache beam in Java. Usd HBase Client Put and Mutation for Converting Bq rows into BigTable Scheam mapping. And, used CloudBigtableIO.writeToTable(new CloudBigtableTableConfiguration()) for writing into BigTable. 

Transformation part:

 

 

static final DoFn<TableRow, Mutation> MUTATION_TRANSFORM = new DoFn<TableRow, Mutation>() {
    private static final long serialVersionUID = 1L;
    @ProcessElement
    public void processElement(ProcessContext c) throws Exception {
      TableRow row = c.element();
      String specificColumnValue = (String) row.get("cust_id");//rowkey
      Put p = new Put(Bytes.toBytes(specificColumnValue));
      for (Map.Entry<String, Object> field : row.entrySet()) {
      if (!field.getKey().equals("specificColumnName")) {
      p.addColumn(FAMILY, field.getKey().getBytes(), ((String) field.getValue()).getBytes());
      }
      }
      c.output(p);
    }
  };

 

 

I am able to submit the Job. While running I am getting NoClassFound Exceptions. 

  • Error message from worker: java.lang.VerifyError: class com.google.protobuf.LiteralByteString overrides final method com.google.protobuf.ByteString.peekCachedHashCode()I
  • Error message from worker: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.util.ByteStringer

Looks like dependency issue. But I am able to import ByteStringer.

Could you please help me to provide proper resolution or proper dependency versions ?

Much appreciated your support!!

Solved Solved
4 2 103
1 ACCEPTED SOLUTION

Hi @Rengavz ,

The BigQuery to Bigtable Flex Template is designed for tables, not views. This is likely due to the underlying way it structures the data extraction.

These errors typically stem from dependency conflicts or missing libraries. The specific errors you're encountering indicate a mismatch in the Protobuf library version and potential issues with the HBase client initialization.

Below is a possible approach to tackle these issues:

  1. Alternative to Flex Template

    Since the Flex Template isn't a perfect fit for views, consider these options:

    • Materialized View: Create a materialized view in BigQuery that periodically refreshes from your underlying view. This essentially provides a "table-like" structure for the Dataflow pipeline to work with.

    • Custom Query: If the materialized view isn't feasible, modify your Apache Beam pipeline to directly query the BigQuery view using the BigQueryIO.read() transform with a query string.

  2. Resolving Dependency Issues

    • Protobuf Compatibility: The java.lang.VerifyError related to Protobuf indicates a version conflict. Ensure your project dependencies explicitly use the same Protobuf version across all components (Beam, Bigtable client, and any other libraries).

    • HBase Client Initialization: The java.lang.NoClassDefFoundError with ByteStringer suggests a problem initializing the HBase client. Double-check that you're using a compatible version of the HBase client library that matches your HBase cluster version. If you're using a managed Bigtable instance, ensure your client library version aligns with Google's recommendations.

  3. Refining Your Beam Pipeline 

     
    PipelineOptions options = PipelineOptionsFactory.create();
    Pipeline p = Pipeline.create(options);
    
    p.apply("Read from BigQuery view", BigQueryIO.readTableRows()
           .fromQuery("SELECT * FROM your_view")  // Query your view
           .usingStandardSql())
     .apply("Transform to Mutations", ParDo.of(MUTATION_TRANSFORM)) 
     .apply("Write to Bigtable", CloudBigtableIO.writeToTable(
           new CloudBigtableTableConfiguration.Builder()
               .withProjectId(projectId)
               .withInstanceId(instanceId)
               .withTableId(tableId)
               .build()));
    
    p.run();
    

View solution in original post

2 REPLIES 2

Hi @Rengavz ,

The BigQuery to Bigtable Flex Template is designed for tables, not views. This is likely due to the underlying way it structures the data extraction.

These errors typically stem from dependency conflicts or missing libraries. The specific errors you're encountering indicate a mismatch in the Protobuf library version and potential issues with the HBase client initialization.

Below is a possible approach to tackle these issues:

  1. Alternative to Flex Template

    Since the Flex Template isn't a perfect fit for views, consider these options:

    • Materialized View: Create a materialized view in BigQuery that periodically refreshes from your underlying view. This essentially provides a "table-like" structure for the Dataflow pipeline to work with.

    • Custom Query: If the materialized view isn't feasible, modify your Apache Beam pipeline to directly query the BigQuery view using the BigQueryIO.read() transform with a query string.

  2. Resolving Dependency Issues

    • Protobuf Compatibility: The java.lang.VerifyError related to Protobuf indicates a version conflict. Ensure your project dependencies explicitly use the same Protobuf version across all components (Beam, Bigtable client, and any other libraries).

    • HBase Client Initialization: The java.lang.NoClassDefFoundError with ByteStringer suggests a problem initializing the HBase client. Double-check that you're using a compatible version of the HBase client library that matches your HBase cluster version. If you're using a managed Bigtable instance, ensure your client library version aligns with Google's recommendations.

  3. Refining Your Beam Pipeline 

     
    PipelineOptions options = PipelineOptionsFactory.create();
    Pipeline p = Pipeline.create(options);
    
    p.apply("Read from BigQuery view", BigQueryIO.readTableRows()
           .fromQuery("SELECT * FROM your_view")  // Query your view
           .usingStandardSql())
     .apply("Transform to Mutations", ParDo.of(MUTATION_TRANSFORM)) 
     .apply("Write to Bigtable", CloudBigtableIO.writeToTable(
           new CloudBigtableTableConfiguration.Builder()
               .withProjectId(projectId)
               .withInstanceId(instanceId)
               .withTableId(tableId)
               .build()));
    
    p.run();
    

@ms4446  Thank you for your support and details explanation.

I am able to resolve the issue with below dependencies.

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-shaded-client</artifactId>
    <version>1.7.2</version>
</dependency>
 
<dependency>
  <groupId>com.google.cloud.bigtable</groupId>
  <artifactId>bigtable-hbase-beam</artifactId>
  <version>2.12.0</version> <!--1.11.0--><!--2.3.0-->
</dependency>