Skip to content

catch pyarrow.lib.ArrowTypeError for augment_schema #2129

Open
@j-blackwell

Description

@j-blackwell

Is your feature request related to a problem? Please describe.
The problematic field name is not returned in the error for the augment_schema function like it is elsewhere. This can then show up in places like load_table_from_dataframe since augment_schema is called.

pyarrow.lib.ArrowTypeError: Expected bytes, got a 'float' object

  File "...", line 57, in load_table_bq
    job = client.load_table_from_dataframe(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/google/cloud/bigquery/client.py", line 2781, in load_table_from_dataframe
    new_job_config.schema = _pandas_helpers.dataframe_to_bq_schema(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 491, in dataframe_to_bq_schema
    bq_schema_out = augment_schema(dataframe, bq_schema_out)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 520, in augment_schema
    arrow_table = pyarrow.array(dataframe.reset_index()[field.name])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/array.pxi", line 360, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 87, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status

Describe the solution you'd like
Fix could be similar to #1836

def augment_schema(dataframe, current_bq_schema):
    ...
    for field in current_bq_schema:
        if field.field_type is not None:
            augmented_schema.append(field)
            continue
        try:
            arrow_table = pyarrow.array(dataframe.reset_index()[field.name])
        except ArrowTypeError:
            msg = f"""Error converting Pandas column with name: "{field.name}" and datatype: "{field.dtype}" to an appropriate pyarrow datatype: ..."""
        _LOGGER.error(msg)
        raise ArrowTypeError(msg)

Happy to submit a PR if this would be approved?

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions