Skip to content

Introduce 'transfers' packages #9320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 16, 2020

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jun 15, 2020


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@potiuk potiuk force-pushed the backport-packages-rename branch 5 times, most recently from bd8d19a to aaefc15 Compare June 16, 2020 02:39
@mik-laj
Copy link
Member

mik-laj commented Jun 16, 2020

Can you view the file operators-and-hooks-ref.rst? It has sections with transfer operators for a long time, so it's easy to find a bug there.
Example:
:mod:airflow.providers.amazon.aws.operators.sftp_to_s3 in Transfer operators and hooks

I think we should also add transfer package to check
https://github.com/PolideaInternal/airflow/blob/backport-packages-rename/docs/build#L171

@potiuk potiuk force-pushed the backport-packages-rename branch from aaefc15 to 45576d8 Compare June 16, 2020 10:33
potiuk added 2 commits June 16, 2020 15:12
Transfer operators have consistent names and are grouped in
the 'transfer' packages.
@potiuk potiuk force-pushed the backport-packages-rename branch from 59e5c2f to 2b99af7 Compare June 16, 2020 13:13
@potiuk
Copy link
Member Author

potiuk commented Jun 16, 2020

Can you view the file operators-and-hooks-ref.rst? It has sections with transfer operators for a long time, so it's easy to find a bug there.
Example:
:mod:airflow.providers.amazon.aws.operators.sftp_to_s3 in Transfer operators and hooks

I think we should also add transfer package to check
https://github.com/PolideaInternal/airflow/blob/backport-packages-rename/docs/build#L171

@mik-laj -> I think all the concerns are addressed now. I also reviewed operators-and-hooks.rst and it looks all should be fine now.

- :mod:`airflow.providers.google.cloud.operators.s3_to_gcs`,
:mod:`airflow.providers.google.cloud.operators.cloud_storage_transfer_service`
- :mod:`airflow.providers.google.cloud.transfers.s3_to_gcs`,
:mod:`airflow.providers.google.cloud.transfers.cloud_storage_transfer_service`
Copy link
Member

@mik-laj mik-laj Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cna you check it? Is it ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by OK ? It looks good. What do you suspect is wrong here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloud transfer service operators should be in operators package.

@potiuk
Copy link
Member Author

potiuk commented Jun 16, 2020

@mik-laj -> all corrected, just the last comment I am not sure about

@potiuk
Copy link
Member Author

potiuk commented Jun 16, 2020

@feluelle - one more chance to take a look before I merge :)

@potiuk potiuk changed the title Introduce transfer packages Introduce 'transfers' packages Jun 16, 2020
@potiuk potiuk force-pushed the backport-packages-rename branch from 8c484a9 to 5b894df Compare June 16, 2020 17:20
@potiuk
Copy link
Member Author

potiuk commented Jun 16, 2020

Added more descriptions for Transfer packages in CONTRIBUTING.rst and added transfer packages to be checked for consistency (added two missing modules and one missing howto.

@potiuk potiuk merged commit f6bd817 into apache:master Jun 16, 2020
@potiuk potiuk deleted the backport-packages-rename branch June 16, 2020 20:55
@@ -24,7 +24,7 @@
from airflow.utils.decorators import apply_defaults


class S3ToMySqlTransferOperator(BaseOperator):
class S3ToMySqlOperator(BaseOperator):
"""
Loads a file from S3 into a MySQL table.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask what's the requirement S3 file format for this Operator? json? csv? parquet and etc.

Copy link
Member

@feluelle feluelle Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It calls bulk_load_custom which uses LOAD DATA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LOAD DATA statement reads rows from a text file into a table at a very high speed.

You can define sth like:

    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]

for csv files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @feluelle, this helped a lot!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question! Does this work with parquet files? After looking into the docs, I don't think it does. Right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think it works directly. You would have to convert parquet to csv and load csv to mysql.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants