Skip to content

Faster INSERT methods #497

Open
Open
@JoaoPaes-at-Dynamox

Description

@JoaoPaes-at-Dynamox

Is your feature request related to a problem? Please describe.
I need to work with large amounts of rows on an ETL service and wanted to use SQLAlchemy ORM backed by BigQuery dialect to write more pythonic code, but it is extremely slow to upload more than tens of rows.

Describe the solution you'd like
Be able to use ORM-Enabled INSERT like the SQLAlchemy docs suggest, or at least the soon-to-be legacy Session.bulk_insert_mappings and have better performance

Describe alternatives you've considered
I've tried using all methods provided by this SQLA FAQ and none could be faster than about 900 seconds for a minimum of 200 rows.

Aditional context
While the examples shows that methods like the bulk_insert_mappings could do 0.4 seconds for hundreds of thousands of rows, using BigQuery dialect I've faced 2 h 30min to upload about 4k rows. I don't expect that BQ could reach anything close to the 0.4 seconds for 100k rows, but something around 5, 6 minutes I do believe is possible, since it is the average time the BigQuery client from google.cloud.bigquery can do for an upload of about 90k rows.

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-sqlalchemy API.priority: p3Desirable enhancement or fix. May not be included in next release.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions