Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add job_id, location, project, and query_id properties on RowIterator #1733

Merged
merged 3 commits into from
Nov 18, 2023

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Nov 17, 2023

These can be used to recover the original job metadata when RowIterator is the result of a QueryJob. This will be especially important after #1722 where query_and_wait() won't return an actual QueryJob object.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Towards internal issue 305260214 and related to #589
🦕

…on `RowIterator`

These can be used to recover the original job metadata when `RowIterator` is
the result of a `QueryJob`.
@tswast tswast requested review from a team as code owners November 17, 2023 20:59
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery API. labels Nov 17, 2023
@tswast tswast requested review from Linchin and chalmerlowe and removed request for farhan0102 November 17, 2023 21:00
@@ -1723,7 +1766,7 @@ def to_arrow_iterable(

bqstorage_download = functools.partial(
_pandas_helpers.download_arrow_bqstorage,
self._project,
self._bqstorage_project,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we now have two possible project values, does this mean a RowIterator can correspond to different projects for storage APIs compared to other APIs?

@property
def _bqstorage_project(self) -> Optional[str]:
"""GCP Project ID where BQ Storage API will bill to (if applicable)."""
client = self.client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what's the difference between self.client.project and self._project. When the client isn't of storage type, does self.client.project still reflect the storage project id?

Copy link
Contributor Author

@tswast tswast Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference in other contexts is usually described as "billing project" versus "data project". I've renamed this private property to reflect that. This is easiest to understand in the context of public datasets.

I have permission to query tables in the bigquery-public-data project and even download the data in bulk with the BigQuery Storage Read API, but I don't have permission to run queries in that project or start a BigQuery Storage Read API session from that project. Instead, I have start the query or bq storage read session in my own project and reference the tables in the other project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I see! so self._project is the project for the data/table, and self.client.project a.k.a. billing project is the user's own project. So I guess this billing project doesn't just apply to storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, though for queries we're using jobs.getQueryResults, so the project will always be the billing project since that's where temp results for queries are stored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tabledata.list is free, but technically we should be overriding the x-goog-user-project header so that the quota is charged to the billing project. (I don't think we're actually doing that, though)

@Linchin Linchin merged commit 494f275 into main Nov 18, 2023
20 checks passed
@Linchin Linchin deleted the b305260214-rowiterator-job-metadata branch November 18, 2023 00:44
@@ -2113,6 +2113,38 @@ def test_constructor_with_dict_schema(self):
]
self.assertEqual(iterator.schema, expected_schema)

def test_job_id_missing(self):
rows = self._make_one()
self.assertIsNone(rows.job_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast

Just curious about testing philosophy.
For this collection of tests that are pretty much the same (i.e. all the tests that .assertIsNone()) is there a benefit in your mind to having each test broken out separately versus providing a parameterized list of inputs and expected outputs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I matched the existing pattern in this test. I do think we could probably combine a lot of these into a single parametrized test.

I worry slightly about the required getattr(property_name) being too complex for new contributors to understand. In general tests should avoid anything even as complex as an if statement. https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html In this case, getattr is a grey area. I wouldn't necessarily count it as "logic"

kiraksi pushed a commit to kiraksi/python-bigquery that referenced this pull request Nov 20, 2023
…on `RowIterator` (googleapis#1733)

* feat: add `job_id`, `location`, `project`, and `query_id` properties on `RowIterator`

These can be used to recover the original job metadata when `RowIterator` is
the result of a `QueryJob`.

* rename bqstorage_project to billing project

* Update google/cloud/bigquery/table.py

Co-authored-by: Lingqing Gan <[email protected]>

---------

Co-authored-by: Lingqing Gan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants