Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PostgresToGCSOperator does not allow nested JSON #23063

Merged

Conversation

pierrejeambrun
Copy link
Member

@pierrejeambrun pierrejeambrun commented Apr 18, 2022

fixes: #23040

This is due to a double json.dumps when exporting to json format:

I added a parameter to the convert_type that allow us to chose what to do with the dict type objects. For parquet and csv we want to stringify them. But we want to keep them as dict when exporting to the json format.

csv and parquet export are hence not modified.

I added data to the tests so we assert json column export.

Regards,

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Apr 18, 2022
@pierrejeambrun pierrejeambrun force-pushed the 23040-postgres-to-gcs-operator-nested-json branch from bdfcdf2 to 1ca6b51 Compare April 18, 2022 22:00
@pierrejeambrun
Copy link
Member Author

Hello @eladkal,

Here if a first draft of the PR just in case you want to take a look :)

Best,

@eladkal eladkal changed the title PostgresToGCSOperator does not allow nested JSON Fix PostgresToGCSOperator does not allow nested JSON May 3, 2022
@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label May 3, 2022
@github-actions
Copy link

github-actions bot commented May 3, 2022

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit 766726f into apache:main May 8, 2022
@pierrejeambrun pierrejeambrun deleted the 23040-postgres-to-gcs-operator-nested-json branch May 9, 2022 07:23
@eladkal
Copy link
Contributor

eladkal commented May 9, 2022

Thanks @pierrejeambrun hopeful someday we will have #21599 resolved so we won't need dedicated operators per db

@pierrejeambrun
Copy link
Member Author

pierrejeambrun commented May 9, 2022

@eladkal, my pleasure. Matter of fact I was wondering about that as there are strong similarities on some part of the code.

Good to know there already is an issue tracking this, it's not going to be an easy one though 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers okay to merge It's ok to merge this PR as it does not require more tests provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PostgresToGCSOperator does not allow nested JSON
3 participants