HBase Batch Source

Plugin version: 2.11.0

Batch source that reads from a column family in an HBase table. This source differs from the Table source in that it does not use a CDAP dataset, but reads directly from HBase.

This source is used when you want to read from a column family in an HBase table. For example, you may want to read from an HBase table, filter some records out, then write the results to a Database table.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Reference Name

No

Required. This will be used to uniquely identify this source for lineage, annotating metadata, etc.

HBase Table Name

Yes

Required. The name of the table to read from.

HBase Column Family

Yes

Required. The name of the column family to read from.

Zookeeper Quorum String

Yes

Optional. The ZooKeeper quorum for the hbase instance you are reading from. This should be a comma separated list of hosts that make up the quorum. You can find the correct value by looking at the hbase.zookeeper.quorum setting in your hbase-site.xml file.

Default is 'localhost'.

Zookeeper Client Port

Yes

Optional. The client port used to connect to the ZooKeeper quorum. You can find the correct value by looking at the hbase.zookeeper.quorum setting in your hbase-site.xml.

Default is 2181.

Row Field Name

No

Required. Field name indicating that the field value should come from the row key instead of a row column. The field name specified must be present in the schema, and must not be nullable.

Output Schema

No

Required. The output schema for the data.

Example

This example reads from the attr column family of an HBase table named users:

Property

Value

Property

Value

Reference Name

hbase

HBase Table Name

users

HBase Column Family

attr

Zookeeper Quorum String

host1,host2,host3

Zookeeper Client Port

2181

Row Field Name

id

Output Schema

{ \"type\":\"record\", \"name\":\"user\", \"fields\":[ {\"name\":\"id\",\"type\":\"long\"}, {\"name\":\"name\",\"type\":\"string\"}, {\"name\":\"birthyear\",\"type\":\"int\"} ] }", "schema.row.field": "id" } }

It outputs records with this schema:

field name

type

field name

type

id

long

name

string

birthyear

int

The ‘id’ field will be read from the row key of the table. The name field will be read from the name column in the table. The birthyear field will be read from the birthyear column in the table. Any other columns in the Table will be ignored by the source.



Created in 2020 by Google Inc.