-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Introduce 'transfers' packages #9320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce 'transfers' packages #9320
Conversation
bd8d19a
to
aaefc15
Compare
Can you view the file operators-and-hooks-ref.rst? It has sections with transfer operators for a long time, so it's easy to find a bug there. I think we should also add transfer package to check |
aaefc15
to
45576d8
Compare
Transfer operators have consistent names and are grouped in the 'transfer' packages.
59e5c2f
to
2b99af7
Compare
@mik-laj -> I think all the concerns are addressed now. I also reviewed operators-and-hooks.rst and it looks all should be fine now. |
docs/operators-and-hooks-ref.rst
Outdated
- :mod:`airflow.providers.google.cloud.operators.s3_to_gcs`, | ||
:mod:`airflow.providers.google.cloud.operators.cloud_storage_transfer_service` | ||
- :mod:`airflow.providers.google.cloud.transfers.s3_to_gcs`, | ||
:mod:`airflow.providers.google.cloud.transfers.cloud_storage_transfer_service` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cna you check it? Is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by OK ? It looks good. What do you suspect is wrong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloud transfer service operators should be in operators package.
@mik-laj -> all corrected, just the last comment I am not sure about |
@feluelle - one more chance to take a look before I merge :) |
8c484a9
to
5b894df
Compare
Added more descriptions for Transfer packages in CONTRIBUTING.rst and added transfer packages to be checked for consistency (added two missing modules and one missing howto. |
@@ -24,7 +24,7 @@ | |||
from airflow.utils.decorators import apply_defaults | |||
|
|||
|
|||
class S3ToMySqlTransferOperator(BaseOperator): | |||
class S3ToMySqlOperator(BaseOperator): | |||
""" | |||
Loads a file from S3 into a MySQL table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask what's the requirement S3 file format for this Operator? json? csv? parquet and etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It calls bulk_load_custom which uses LOAD DATA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LOAD DATA statement reads rows from a text file into a table at a very high speed.
You can define sth like:
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
for csv files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @feluelle, this helped a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question! Does this work with parquet files? After looking into the docs, I don't think it does. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think it works directly. You would have to convert parquet to csv and load csv to mysql.
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.