Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query_and_wait use in Cursor.execute breaks default_query_job_config in SQLAlchemy BQ Dialect #1867

Closed
wgunther opened this issue Mar 22, 2024 · 2 comments · Fixed by #1869
Closed
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API.

Comments

@wgunther
Copy link

wgunther commented Mar 22, 2024

Environment details

  • OS type and version: Ubuntu 20.04.5 LTS
  • Python version: python --version 3.8
  • pip version: pip --version 20.0.2
  • google-cloud-bigquery version: pip show google-cloud-bigquery 3.19.0

Steps to reproduce

  1. Run the code in the example
  2. Check job from query_and_wait -- it is interactive
  3. Check job from query -- it is batch

Code example

import google.cloud.bigquery

job_config = google.cloud.bigquery.QueryJobConfig(
    priority=google.cloud.bigquery.QueryPriority.BATCH,
    labels={"test": "test2"}
)
bq_client = google.cloud.bigquery.Client()
bq_client.query_and_wait("SELECT \"query_and_wait\"", job_config=job_config)
bq_client.query("SELECT \"query\"", job_config=job_config)

Details

query_and_wait does not respect all of the QueryJobConfig options. This is because it seems to go through a different API (jobs.query rather than jobs.insert), and jobs.query doesn't even seem to respect some of these.

The query_and_wait call above ultimately performs a jobs.query call with this payload: {"labels": {"test": "test2"}, "priority": "BATCH", "useLegacySql": false, "formatOptions": {"useInt64Timestamp": true}, "query": "SELECT \\"query_and_wait\\"", "requestId": "..."}

The query call performs this: '{"jobReference": {"jobId": "...", "projectId": "..."}, "configuration": {"query": {"priority": "BATCH", "useLegacySql": false, "query": "SELECT \\"query\\""}, "labels": {"test": "test2"}}}'

Note: priority is not in the jobs.query API, but impact goes beyond this one particular option in QueryJobConfig. Labels happens to work because labels happens to be part of the jobs.query API.

Impact

Although a caller could just migrate from query_and_wait to query if they were making the call, this is not an option with SQLAlchemy.

The Cursor.execute function, from the dbapi, as an optimization, will call query_and_wait without a job ID. This breaks the ability in SQLAlchemy interactions with BigQuery to supply default query job configs, which is explicitly a goal of the library. "For situations like these, or for situations where you want the Client to have a default_query_job_config you can pass many arguments in the query of the connection string."

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Mar 22, 2024
@chalmerlowe
Copy link
Contributor

@tswast

Can you take a look at this?

@tswast
Copy link
Contributor

tswast commented Mar 25, 2024

I find it strange that the backend eats unsupported properties in the request body, but that does appear to be the behavior.

We do have a denylist of properties we know aren't supported by jobs.query here:

def _supported_by_jobs_query(job_config: Optional[job.QueryJobConfig]) -> bool:
"""True if jobs.query can be used. False if jobs.insert is needed."""
if job_config is None:
return True
return (
# These features aren't supported by jobs.query.
job_config.clustering_fields is None
and job_config.destination is None
and job_config.destination_encryption_configuration is None
and job_config.range_partitioning is None
and job_config.table_definitions is None
and job_config.time_partitioning is None
)

Switching this to an allowlist of properties we know are supported should fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API.
Projects
None yet
3 participants