Skip to content

Commit

Permalink
Introduce 'transfers' packages (#9320)
Browse files Browse the repository at this point in the history
* Consistent naming of transfer operators

Transfer operators have consistent names and are grouped in
the 'transfer' packages.

* fixup! Consistent naming of transfer operators

* Introduces 'transfers' packages.

Closes #9161 and #8620

* fixup! Introduces 'transfers' packages.

* fixup! fixup! Introduces 'transfers' packages.

* fixup! fixup! fixup! Introduces 'transfers' packages.
  • Loading branch information
potiuk committed Jun 16, 2020
1 parent c78e2a5 commit f6bd817
Show file tree
Hide file tree
Showing 343 changed files with 2,612 additions and 1,544 deletions.
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,8 @@ metastore_browser/templates/.*\\.html$|.*\\.jinja2"
(?x)
^airflow/providers/apache/cassandra/hooks/cassandra.py$|
^airflow/providers/apache/hive/operators/hive_stats.py$|
^airflow/providers/apache/hive/PROVIDERS_CHANGES_*|
^airflow/providers/apache/hive/README.md$|
^tests/providers/apache/cassandra/hooks/test_cassandra.py
- id: consistent-pylint
language: pygrep
Expand Down
68 changes: 66 additions & 2 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -542,8 +542,71 @@ We support the following types of tests:

For details on running different types of Airflow tests, see `TESTING.rst <TESTING.rst>`_.


Naming Conventions for provider packages
========================================

In Airflow 2.0 we standardized and enforced naming for provider packages, modules and classes.
those rules (introduced as AIP-21) were not only introduced but enforced using automated checks
that verify if the naming conventions are followed. Here is a brief summary of the rules, for
detailed discussion you can go to [AIP-21 Changes in import paths](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths)

The rules are as follows:

* Provider packages are all placed in 'airflow.providers'

* Providers are usually direct sub-packages of the 'airflow.providers' package but in some cases they can be
further split into sub-packages (for example 'apache' package has 'cassandra', 'druid' ... providers ) out
of which several different provider packages are produced (apache.cassandra, apache.druid). This is
case when the providers are connected under common umbrella but very loosely coupled on the code level.

* In some cases the package can have sub-packages but they are all delivered as single provider
package (for example 'google' package contains 'ads', 'cloud' etc. sub-packages). This is in case
the providers are connected under common umbrella and they are also tightly coupled on the code level.

* Typical structure of provider package:
* example_dags -> example DAGs are stored here (used for documentation and System Tests)
* hooks -> hooks are stored here
* operators -> operators are stored here
* sensors -> sensors are stored here
* secrets -> secret backends are stored here
* transfers -> transfer operators are stored here

* Module names do not contain word "hooks" , "operators" etc. The right type comes from
the package. For example 'hooks.datastore' module contains DataStore hook and 'operators.datastore'
contains DataStore operators.

* Class names contain 'Operator', 'Hook', 'Sensor' - for example DataStoreHook, DataStoreExportOperator

* Operator name usually follows the convention: <Subject><Action><Entity>Operator
(BigQueryExecuteQueryOperator) is a good example

* Transfer Operators are those that actively push data from one service/provider and send it to another
service (might be for the same or another provider). This usually involves two hooks. The convention
for those <Source>To<Destination>Operator. They are not named *TransferOperator nor *Transfer.
* Operators that use external service to perform transfer (for example CloudDataTransferService operators
are not placed in "transfers" package and do not have to follow the naming convention for
transfer operators.

* It is often debatable where to put transfer operators but we agreed to the following criteria:

* We use "maintainability" of the operators as the main criteria - so the transfer operator
should be kept at the provider which has highest "interest" in the transfer operator

* For Cloud Providers or Service providers that usually means that the transfer operators
should land at the "target" side of the transfer

* Secret Backend name follows the convention: <SecretEngine>Backend.

* Tests are grouped in parallel packages under "tests.providers" top level package. Module name is usually
"test_<object_to_test>.py',

* System tests (not yet fully automated but allowing to run e2e testing of partucular provider) are
named with _system.py suffix.

Metadata Database Updates
==============================
=========================

When developing features, you may need to persist information to the metadata
database. Airflow has `Alembic <https://github.com/sqlalchemy/alembic>`__ built-in
Expand Down Expand Up @@ -623,7 +686,7 @@ could get a reproducible build. See the `Yarn docs


Generate Bundled Files with yarn
----------------------------------
--------------------------------

To parse and generate bundled files for Airflow, run either of the following
commands:
Expand Down Expand Up @@ -910,6 +973,7 @@ You can join the channels via links at the `Airflow Community page <https://airf
* The deprecated `JIRA issues <https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-4470?filter=allopenissues>`_ for:
* checking out old but still valuable issues that are not on Github yet
* mentioning the JIRA issue number in the title of the related PR you would like to open on Github

**IMPORTANT**
We don't create new issues on JIRA anymore. The reason we still look at JIRA issues is that there are valuable tickets inside of it. However, each new PR should be created on `Github issues <https://github.com/apache/airflow/issues>`_ as stated in `Contribution Workflow Example <https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example>`_

Expand Down
4 changes: 2 additions & 2 deletions UPDATING.md
Original file line number Diff line number Diff line change
Expand Up @@ -912,7 +912,7 @@ The following table shows changes in import paths.
|airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateOperator |airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateWorkflowTemplateOperator |
|airflow.contrib.operators.datastore_export_operator.DatastoreExportOperator |airflow.providers.google.cloud.operators.datastore.DatastoreExportOperator |
|airflow.contrib.operators.datastore_import_operator.DatastoreImportOperator |airflow.providers.google.cloud.operators.datastore.DatastoreImportOperator |
|airflow.contrib.operators.file_to_gcs.FileToGoogleCloudStorageOperator |airflow.providers.google.cloud.operators.local_to_gcs.FileToGoogleCloudStorageOperator |
|airflow.contrib.operators.file_to_gcs.FileToGoogleCloudStorageOperator |airflow.providers.google.cloud.transfers.local_to_gcs.FileToGoogleCloudStorageOperator |
|airflow.contrib.operators.gcp_bigtable_operator.BigtableClusterUpdateOperator |airflow.providers.google.cloud.operators.bigtable.BigtableUpdateClusterOperator |
|airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceCreateOperator |airflow.providers.google.cloud.operators.bigtable.BigtableCreateInstanceOperator |
|airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceDeleteOperator |airflow.providers.google.cloud.operators.bigtable.BigtableDeleteInstanceOperator |
Expand Down Expand Up @@ -1006,7 +1006,7 @@ The following table shows changes in import paths.
|airflow.contrib.operators.gcs_acl_operator.GoogleCloudStorageBucketCreateAclEntryOperator |airflow.providers.google.cloud.operators.gcs.GCSBucketCreateAclEntryOperator |
|airflow.contrib.operators.gcs_acl_operator.GoogleCloudStorageObjectCreateAclEntryOperator |airflow.providers.google.cloud.operators.gcs.GCSObjectCreateAclEntryOperator |
|airflow.contrib.operators.gcs_delete_operator.GoogleCloudStorageDeleteOperator |airflow.providers.google.cloud.operators.gcs.GCSDeleteObjectsOperator |
|airflow.contrib.operators.gcs_download_operator.GoogleCloudStorageDownloadOperator |airflow.providers.google.cloud.operators.gcs.GCSToLocalOperator |
|airflow.contrib.operators.gcs_download_operator.GoogleCloudStorageDownloadOperator |airflow.providers.google.cloud.operators.gcs.GCSToLocalFilesystemOperator |
|airflow.contrib.operators.gcs_list_operator.GoogleCloudStorageListOperator |airflow.providers.google.cloud.operators.gcs.GCSListObjectsOperator |
|airflow.contrib.operators.gcs_operator.GoogleCloudStorageCreateBucketOperator |airflow.providers.google.cloud.operators.gcs.GCSCreateBucketOperator |
|airflow.contrib.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator |airflow.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator |
Expand Down
10 changes: 5 additions & 5 deletions airflow/contrib/operators/adls_to_gcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,28 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""This module is deprecated. Please use `airflow.providers.google.cloud.operators.adls_to_gcs`."""
"""This module is deprecated. Please use `airflow.providers.google.cloud.transfers.adls_to_gcs`."""

import warnings

from airflow.providers.google.cloud.operators.adls_to_gcs import ADLSToGCSOperator
from airflow.providers.google.cloud.transfers.adls_to_gcs import ADLSToGCSOperator

warnings.warn(
"This module is deprecated. Please use `airflow.providers.google.cloud.operators.adls_to_gcs`.",
"This module is deprecated. Please use `airflow.providers.google.cloud.transfers.adls_to_gcs`.",
DeprecationWarning, stacklevel=2
)


class AdlsToGoogleCloudStorageOperator(ADLSToGCSOperator):
"""
This class is deprecated.
Please use `airflow.providers.google.cloud.operators.adls_to_gcs.ADLSToGCSOperator`.
Please use `airflow.providers.google.cloud.transfers.adls_to_gcs.ADLSToGCSOperator`.
"""

def __init__(self, *args, **kwargs):
warnings.warn(
"""This class is deprecated.
Please use `airflow.providers.google.cloud.operators.adls_to_gcs.ADLSToGCSOperator`.""",
Please use `airflow.providers.google.cloud.transfers.adls_to_gcs.ADLSToGCSOperator`.""",
DeprecationWarning, stacklevel=2
)
super().__init__(*args, **kwargs)
6 changes: 3 additions & 3 deletions airflow/contrib/operators/bigquery_to_bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""This module is deprecated. Please use `airflow.providers.google.cloud.operators.bigquery_to_bigquery`."""
"""This module is deprecated. Please use `airflow.providers.google.cloud.transfers.bigquery_to_bigquery`."""

import warnings

# pylint: disable=unused-import
from airflow.providers.google.cloud.operators.bigquery_to_bigquery import BigQueryToBigQueryOperator # noqa
from airflow.providers.google.cloud.transfers.bigquery_to_bigquery import BigQueryToBigQueryOperator # noqa

warnings.warn(
"This module is deprecated. Please use `airflow.providers.google.cloud.operators.bigquery_to_bigquery`.",
"This module is deprecated. Please use `airflow.providers.google.cloud.transfers.bigquery_to_bigquery`.",
DeprecationWarning, stacklevel=2
)
10 changes: 5 additions & 5 deletions airflow/contrib/operators/bigquery_to_gcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,28 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""This module is deprecated. Please use `airflow.providers.google.cloud.operators.bigquery_to_gcs`."""
"""This module is deprecated. Please use `airflow.providers.google.cloud.transfers.bigquery_to_gcs`."""

import warnings

from airflow.providers.google.cloud.operators.bigquery_to_gcs import BigQueryToGCSOperator
from airflow.providers.google.cloud.transfers.bigquery_to_gcs import BigQueryToGCSOperator

warnings.warn(
"This module is deprecated. Please use `airflow.providers.google.cloud.operators.bigquery_to_gcs`.",
"This module is deprecated. Please use `airflow.providers.google.cloud.transfers.bigquery_to_gcs`.",
DeprecationWarning, stacklevel=2
)


class BigQueryToCloudStorageOperator(BigQueryToGCSOperator):
"""
This class is deprecated.
Please use `airflow.providers.google.cloud.operators.bigquery_to_gcs.BigQueryToGCSOperator`.
Please use `airflow.providers.google.cloud.transfers.bigquery_to_gcs.BigQueryToGCSOperator`.
"""

def __init__(self, *args, **kwargs):
warnings.warn(
"""This class is deprecated.
Please use `airflow.providers.google.cloud.operators.bigquery_to_gcs.BigQueryToGCSOperator`.""",
Please use `airflow.providers.google.cloud.transfers.bigquery_to_gcs.BigQueryToGCSOperator`.""",
DeprecationWarning, stacklevel=2
)
super().__init__(*args, **kwargs)
6 changes: 3 additions & 3 deletions airflow/contrib/operators/bigquery_to_mysql_operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""This module is deprecated. Please use `airflow.providers.google.cloud.operators.bigquery_to_mysql`."""
"""This module is deprecated. Please use `airflow.providers.google.cloud.transfers.bigquery_to_mysql`."""

import warnings

# pylint: disable=unused-import
from airflow.providers.google.cloud.operators.bigquery_to_mysql import BigQueryToMySqlOperator # noqa
from airflow.providers.google.cloud.transfers.bigquery_to_mysql import BigQueryToMySqlOperator # noqa

warnings.warn(
"This module is deprecated. Please use `airflow.providers.google.cloud.operators.bigquery_to_mysql`.",
"This module is deprecated. Please use `airflow.providers.google.cloud.transfers.bigquery_to_mysql`.",
DeprecationWarning, stacklevel=2
)
10 changes: 5 additions & 5 deletions airflow/contrib/operators/cassandra_to_gcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,29 +16,29 @@
# specific language governing permissions and limitations
# under the License.
"""
This module is deprecated. Please use `airflow.providers.google.cloud.operators.cassandra_to_gcs`.
This module is deprecated. Please use `airflow.providers.google.cloud.transfers.cassandra_to_gcs`.
"""

import warnings

from airflow.providers.google.cloud.operators.cassandra_to_gcs import CassandraToGCSOperator
from airflow.providers.google.cloud.transfers.cassandra_to_gcs import CassandraToGCSOperator

warnings.warn(
"This module is deprecated. Please use `airflow.providers.google.cloud.operators.cassandra_to_gcs`.",
"This module is deprecated. Please use `airflow.providers.google.cloud.transfers.cassandra_to_gcs`.",
DeprecationWarning, stacklevel=2
)


class CassandraToGoogleCloudStorageOperator(CassandraToGCSOperator):
"""
This class is deprecated.
Please use `airflow.providers.google.cloud.operators.cassandra_to_gcs.CassandraToGCSOperator`.
Please use `airflow.providers.google.cloud.transfers.cassandra_to_gcs.CassandraToGCSOperator`.
"""

def __init__(self, *args, **kwargs):
warnings.warn(
"""This class is deprecated.
Please use `airflow.providers.google.cloud.operators.cassandra_to_gcs.CassandraToGCSOperator`.""",
Please use `airflow.providers.google.cloud.transfers.cassandra_to_gcs.CassandraToGCSOperator`.""",
DeprecationWarning, stacklevel=2
)
super().__init__(*args, **kwargs)
6 changes: 3 additions & 3 deletions airflow/contrib/operators/dynamodb_to_s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""This module is deprecated. Please use `airflow.providers.amazon.aws.operators.dynamodb_to_s3`."""
"""This module is deprecated. Please use `airflow.providers.amazon.aws.transfers.dynamodb_to_s3`."""

import warnings

# pylint: disable=unused-import
from airflow.providers.amazon.aws.operators.dynamodb_to_s3 import DynamoDBToS3Operator # noqa
from airflow.providers.amazon.aws.transfers.dynamodb_to_s3 import DynamoDBToS3Operator # noqa

warnings.warn(
"This module is deprecated. Please use `airflow.providers.amazon.aws.operators.dynamodb_to_s3`.",
"This module is deprecated. Please use `airflow.providers.amazon.aws.transfers.dynamodb_to_s3`.",
DeprecationWarning, stacklevel=2
)
10 changes: 5 additions & 5 deletions airflow/contrib/operators/file_to_gcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,30 +16,30 @@
# specific language governing permissions and limitations
# under the License.
"""
This module is deprecated. Please use `airflow.providers.google.cloud.operators.local_to_gcs`.
This module is deprecated. Please use `airflow.providers.google.cloud.transfers.local_to_gcs`.
"""

import warnings

from airflow.providers.google.cloud.operators.local_to_gcs import LocalFilesystemToGCSOperator
from airflow.providers.google.cloud.transfers.local_to_gcs import LocalFilesystemToGCSOperator

warnings.warn(
"This module is deprecated. Please use `airflow.providers.google.cloud.operators.local_to_gcs`,",
"This module is deprecated. Please use `airflow.providers.google.cloud.transfers.local_to_gcs`,",
DeprecationWarning, stacklevel=2
)


class FileToGoogleCloudStorageOperator(LocalFilesystemToGCSOperator):
"""
This class is deprecated.
Please use `airflow.providers.google.cloud.operators.local_to_gcs.LocalFilesystemToGCSOperator`.
Please use `airflow.providers.google.cloud.transfers.local_to_gcs.LocalFilesystemToGCSOperator`.
"""

def __init__(self, *args, **kwargs):
warnings.warn(
"""This class is deprecated.
Please use
`airflow.providers.google.cloud.operators.local_to_gcs.LocalFilesystemToGCSOperator`.""",
`airflow.providers.google.cloud.transfers.local_to_gcs.LocalFilesystemToGCSOperator`.""",
DeprecationWarning, stacklevel=2
)
super().__init__(*args, **kwargs)
6 changes: 3 additions & 3 deletions airflow/contrib/operators/file_to_wasb.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""This module is deprecated. Please use `airflow.providers.microsoft.azure.operators.file_to_wasb`."""
"""This module is deprecated. Please use `airflow.providers.microsoft.azure.transfers.file_to_wasb`."""

import warnings

# pylint: disable=unused-import
from airflow.providers.microsoft.azure.operators.file_to_wasb import FileToWasbOperator # noqa
from airflow.providers.microsoft.azure.transfers.file_to_wasb import FileToWasbOperator # noqa

warnings.warn(
"This module is deprecated. Please use `airflow.providers.microsoft.azure.operators.file_to_wasb`.",
"This module is deprecated. Please use `airflow.providers.microsoft.azure.transfers.file_to_wasb`.",
DeprecationWarning, stacklevel=2
)
4 changes: 2 additions & 2 deletions airflow/contrib/operators/gcp_transfer_operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,8 +180,8 @@ def __init__(self, *args, **kwargs):
class GoogleCloudStorageToGoogleCloudStorageTransferOperator(CloudDataTransferServiceGCSToGCSOperator):
"""
This class is deprecated.
Please use `airflow.providers.google.cloud.operators.data_transfe
r.CloudDataTransferServiceGCSToGCSOperator`.
Please use `airflow.providers.google.cloud.operators.data_transfer
.CloudDataTransferServiceGCSToGCSOperator`.
"""

def __init__(self, *args, **kwargs):
Expand Down

0 comments on commit f6bd817

Please sign in to comment.