Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raised ArrowTypeError should indicates which column is causing the error #1822

Closed
aaaaahaaaaa opened this issue Feb 20, 2024 · 3 comments · Fixed by #1836
Closed

Raised ArrowTypeError should indicates which column is causing the error #1822

aaaaahaaaaa opened this issue Feb 20, 2024 · 3 comments · Fixed by #1836
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@aaaaahaaaaa
Copy link

aaaaahaaaaa commented Feb 20, 2024

Debugging typing errors when working with existing tables with a large number of columns is extremely frustrating, to say the least: the pyarrow error never indicates which column is causing the issue.

It would be an absolute life saver if more debugging information were raised with the exception.

E.g.:

pyarrow.lib.ArrowTypeError: object of type <class 'str'> cannot be converted to int
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 20, 2024
@Linchin
Copy link
Contributor

Linchin commented Feb 20, 2024

Thank you @aaaaahaaaaa for raising the issue. This error message seems to be raised directly by pyarrow, and we might be able to wrap some more useful information outside of it. Could you provide a minimal code snippet that reproduces the issue, so I can locate the exact place the exception is raised?

@Linchin Linchin added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Feb 20, 2024
@aaaaahaaaaa
Copy link
Author

aaaaahaaaaa commented Feb 21, 2024

Here is a minimal example:

import google.auth
import pandas as pd
from google.cloud import bigquery

credentials, project = google.auth.default()

client = bigquery.Client(credentials=credentials, project=project)
table_id = "TABLE_ID"

data = {"what": "ever"}
df = pd.DataFrame([data])

job_config = bigquery.LoadJobConfig(schema=[bigquery.SchemaField("what", "INTEGER")])

job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
job.result()

Error (doesn't show any debugging information referring to the field what):

pyarrow.lib.ArrowInvalid: Could not convert 'ever' with type str: tried to convert to int64

@chalmerlowe
Copy link
Contributor

At this time, I am pushing to get PR #1836 finished so we can close this out.

It appears that we have about three PRs generated by three different folks that are related and focused on similar problems. Aiming to have #1836 be the main focal point and will close out the other two PRs. Unique characteristics of each, where reasonable, will be incorporated into #1836.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
3 participants