-
Notifications
You must be signed in to change notification settings - Fork 127
feat(bigquery): Add support for custom timezones and timestamps #3859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
Just some minor comments.
google-cloud-bigquery/src/test/java/com/google/cloud/bigquery/ExternalTableDefinitionTest.java
Outdated
Show resolved
Hide resolved
google-cloud-bigquery/src/test/java/com/google/cloud/bigquery/LoadJobConfigurationTest.java
Outdated
Show resolved
Hide resolved
* reads the header row as column names and reorders columns to match the field names in the | ||
* schema. | ||
*/ | ||
public abstract Builder setSourceColumnMatch(String sourceColumnMatch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, it's probably useful to have a string enum for: [SOURCE_COLUMN_MATCH_UNSPECIFIED, POSITION, NAME]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code to use enums instead, though oddly enough only the JobConfiguration proto has the UNSPECIFIED
option - the external table only defines POSITION
andNAME
. I made two separate enums for the configuration and external table. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to have 2 separate enums but it does feel weird that the two values are not interchangeable.
@shollyman , do you know if UNSPECIFIED is supposed to be available for external_table also?
@@ -57,6 +57,22 @@ public ExternalDataConfiguration apply(ExternalTableDefinition tableInfo) { | |||
|
|||
private static final long serialVersionUID = -5951580238459622025L; | |||
|
|||
public enum SourceColumnMatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using Google StringEnumType/Value:
For an example, see https://github.com/googleapis/java-bigquery/blob/d4ca3535f54f3282aec133337103bbfa2c9a3653/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/JobStatus.java#L39C1-L40C1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
* reads the header row as column names and reorders columns to match the field names in the | ||
* schema. | ||
*/ | ||
public abstract Builder setSourceColumnMatch(String sourceColumnMatch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to have 2 separate enums but it does feel weird that the two values are not interchangeable.
@shollyman , do you know if UNSPECIFIED is supposed to be available for external_table also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, modulo Phong's suggestion about string enums vs enums.
@@ -57,6 +57,22 @@ public ExternalDataConfiguration apply(ExternalTableDefinition tableInfo) { | |||
|
|||
private static final long serialVersionUID = -5951580238459622025L; | |||
|
|||
public enum SourceColumnMatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This PR adds support for custom timezones, timestamps, datetimes, null markers, and CSV column matching in both the external tables and load job configurations.
Internal issue: b/374142084