HBase Batch Source
Plugin version: 2.11.0
Batch source that reads from a column family in an HBase table. This source differs from the Table source in that it does not use a CDAP dataset, but reads directly from HBase.
This source is used when you want to read from a column family in an HBase table. For example, you may want to read from an HBase table, filter some records out, then write the results to a Database table.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Reference Name | No | Required. This will be used to uniquely identify this source for lineage, annotating metadata, etc. |
HBase Table Name | Yes | Required. The name of the table to read from. |
HBase Column Family | Yes | Required. The name of the column family to read from. |
Zookeeper Quorum String | Yes | Optional. The ZooKeeper quorum for the hbase instance you are reading from. This should be a comma separated list of hosts that make up the quorum. You can find the correct value by looking at the hbase.zookeeper.quorum setting in your hbase-site.xml file. Default is 'localhost'. |
Zookeeper Client Port | Yes | Optional. The client port used to connect to the ZooKeeper quorum. You can find the correct value by looking at the hbase.zookeeper.quorum setting in your hbase-site.xml. Default is 2181. |
Row Field Name | No | Required. Field name indicating that the field value should come from the row key instead of a row column. The field name specified must be present in the schema, and must not be nullable. |
Output Schema | No | Required. The output schema for the data. |
Example
This example reads from the attr
column family of an HBase table named users
:
Property | Value |
---|---|
Reference Name |
|
HBase Table Name |
|
HBase Column Family |
|
Zookeeper Quorum String |
|
Zookeeper Client Port |
|
Row Field Name |
|
Output Schema | {
\"type\":\"record\",
\"name\":\"user\",
\"fields\":[
{\"name\":\"id\",\"type\":\"long\"},
{\"name\":\"name\",\"type\":\"string\"},
{\"name\":\"birthyear\",\"type\":\"int\"}
]
}",
"schema.row.field": "id"
}
} |
It outputs records with this schema:
field name | type |
---|---|
id | long |
name | string |
birthyear | int |
The ‘id’ field will be read from the row key of the table. The name
field will be read from the name
column in the table. The birthyear
field will be read from the birthyear
column in the table. Any other columns in the Table will be ignored by the source.
Created in 2020 by Google Inc.