Skip to content

Commit

Permalink
Move provider dependencies to inside provider folders (#24672)
Browse files Browse the repository at this point in the history
The ``setup.py`` had ALWAUS contained provider dependencies,
but this is really a remnant of Airlfow 1.10 where providers
were not separated out to subfolders of "providers".

This change moves all the provider-specific dependencies
to provider.yaml where they are kept together with all other
provider meta-data.

Later, when we move providers out, we can move them to
provider specific setup.py files (or let provider-specific
setup.py files read them from provider.yaml) but this is
not something we want to do it now.

The dependencies.json is now renamed to provider_dependencies.json
and moved to "airflow" so tha it can be kept as part of the
sources needed for sdist package to provide extras. Pre-commit still
generates the file as needed and it contains now both:

* cross-provider-deps information which providers depend on each
  other
* deps - information what regular dependencies are needed for each
  provider

On top of preparing to splitting providers it has the advantage,
that there will be no more case where adding a dependency change
for provider will not run tests for that provider.
  • Loading branch information
potiuk committed Jun 29, 2022
1 parent 41aa9ab commit 0de31bd
Show file tree
Hide file tree
Showing 108 changed files with 2,433 additions and 1,343 deletions.
2 changes: 1 addition & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -74,12 +74,12 @@
!setup.cfg
!setup.py
!manifests
!generated
# Now - ignore unnecessary files inside allowed directories
# This goes after the allowed directories

# Git version is dynamically generated
airflow/git_version

# Exclude static www files generated by NPM
airflow/www/static/coverage
airflow/www/static/dist
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -840,7 +840,7 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
if: always()

prepare-test-provider-packages-sdist:
timeout-minutes: 40
timeout-minutes: 80
name: "Build and test provider packages sdist"
runs-on: ${{ fromJson(needs.build-info.outputs.runsOn) }}
needs: [build-info, wait-for-ci-images]
Expand Down
16 changes: 4 additions & 12 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -363,11 +363,11 @@ repos:
pass_filenames: false
- id: update-providers-dependencies
name: Update cross-dependencies for providers packages
entry: ./scripts/ci/pre_commit/pre_commit_build_providers_dependencies.sh
entry: ./scripts/ci/pre_commit/pre_commit_build_providers_dependencies.py
language: python
files: ^airflow/providers/.*\.py$|^tests/providers/.*\.py$
files: ^airflow/providers/.*\.py$|^tests/providers/.*\.py$|^tests/system/providers/.*\.py$|$airflow/providers/.*/provider.yaml$
pass_filenames: false
additional_dependencies: ['setuptools']
additional_dependencies: ['setuptools', 'rich>=12.4.4', 'pyyaml']
- id: update-extras
name: Update extras in documentation
entry: ./scripts/ci/pre_commit/pre_commit_insert_extras.py
Expand Down Expand Up @@ -620,14 +620,6 @@ repos:
additional_dependencies: ['pyyaml', 'jinja2', 'black==22.3.0', 'tabulate', 'rich>=12.4.4']
require_serial: true
pass_filenames: false
- id: check-airflow-providers-have-extras
name: Checks providers available when declared by extras in setup.py
language: python
entry: ./scripts/ci/pre_commit/pre_commit_check_extras_have_providers.py
files: ^setup\.py$|^airflow/providers/.*\.py$
pass_filenames: false
require_serial: true
additional_dependencies: ['rich>=12.4.4']
- id: update-breeze-readme-config-hash
name: Update Breeze README.md with config files hash
language: python
Expand Down Expand Up @@ -863,5 +855,5 @@ repos:
entry: ./scripts/ci/pre_commit/pre_commit_migration_reference.py
pass_filenames: false
files: ^airflow/migrations/versions/.*\.py$|^docs/apache-airflow/migrations-ref\.rst$
additional_dependencies: ['rich>=12.4.4', 'inputimeout']
additional_dependencies: ['rich>=12.4.4', 'inputimeout', 'markdown-it-py']
## ONLY ADD PRE-COMMITS HERE THAT REQUIRE CI IMAGE
65 changes: 20 additions & 45 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -637,7 +637,23 @@ Provider packages
Airflow 2.0 is split into core and providers. They are delivered as separate packages:

* ``apache-airflow`` - core of Apache Airflow
* ``apache-airflow-providers-*`` - More than 50 provider packages to communicate with external services
* ``apache-airflow-providers-*`` - More than 70 provider packages to communicate with external services

The information/meta-data about the providers is kept in ``provider.yaml`` file in the right sub-directory
of ``airflow\providers``. This file contains:

* package name (``apache-airflow-provider-*``)
* user-facing name of the provider package
* description of the package that is available in the documentation
* list of versions of package that have been released so far
* list of dependencies of the provider package
* list of additional-extras that the provider package provides (together with dependencies of those extras)
* list of integrations, operators, hooks, sensors, transfers provided by the provider (useful for documentation generation)
* list of connection types, extra-links, secret backends, auth backends, and logging handlers (useful to both
register them as they are needed by Airflow and to include them in documentation automatically).

If you want to add dependencies to the provider, you should add them to the corresponding ``provider.yaml``
and Airflow pre-commits and package generation commands will use them when preparing package information.

In Airflow 1.10 all those providers were installed together within one single package and when you installed
airflow locally, from sources, they were also installed. In Airflow 2.0, providers are separated out,
Expand All @@ -656,7 +672,7 @@ in this airflow folder - the providers package is importable.
Some of the packages have cross-dependencies with other providers packages. This typically happens for
transfer operators where operators use hooks from the other providers in case they are transferring
data between the providers. The list of dependencies is maintained (automatically with pre-commits)
in the ``airflow/providers/dependencies.json``. Pre-commits are also used to generate dependencies.
in the ``generated/provider_dependencies.json``. Pre-commits are also used to generate dependencies.
The dependency list is automatically used during PyPI packages generation.

Cross-dependencies between provider packages are converted into extras - if you need functionality from
Expand All @@ -666,49 +682,8 @@ the other provider package you can install it adding [extra] after the
transfer operators from Amazon ECS.

If you add a new dependency between different providers packages, it will be detected automatically during
pre-commit phase and pre-commit will fail - and add entry in dependencies.json so that the package extra
dependencies are properly added when package is installed.

You can regenerate the whole list of provider dependencies by running this command (you need to have
``pre-commits`` installed).

.. code-block:: bash
pre-commit run build-providers-dependencies
Here is the list of packages and their extras:


.. START PACKAGE DEPENDENCIES HERE
========================== ===========================
Package Extras
========================== ===========================
airbyte http
amazon apache.hive,cncf.kubernetes,exasol,ftp,google,imap,mongo,salesforce,ssh
apache.beam google
apache.druid apache.hive
apache.hive amazon,microsoft.mssql,mysql,presto,samba,vertica
apache.livy http
dbt.cloud http
dingding http
discord http
google amazon,apache.beam,apache.cassandra,cncf.kubernetes,facebook,microsoft.azure,microsoft.mssql,mysql,oracle,postgres,presto,salesforce,sftp,ssh,trino
hashicorp google
microsoft.azure google,oracle,sftp
mysql amazon,presto,trino,vertica
postgres amazon
presto google,slack
salesforce tableau
sftp ssh
slack http
snowflake slack
trino google
========================== ===========================

.. END PACKAGE DEPENDENCIES HERE
and pre-commit will generate new entry in ``generated/provider_dependencies.json`` so that
the package extra dependencies are properly handled when package is installed.

Developing community managed provider packages
----------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -1321,8 +1321,8 @@ RUN REMOVE_ARTIFACTS="false" BUILD_TYPE="build" bash /scripts/docker/compile_www
# So in case setup.py changes we can install latest dependencies required.
COPY setup.py ${AIRFLOW_SOURCES}/setup.py
COPY setup.cfg ${AIRFLOW_SOURCES}/setup.cfg

COPY airflow/__init__.py ${AIRFLOW_SOURCES}/airflow/
COPY generated/provider_dependencies.json ${AIRFLOW_SOURCES}/generated/

COPY --from=scripts install_airflow.sh /scripts/docker/

Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ include airflow/customized_form_field_behaviours.schema.json
include airflow/serialization/schema.json
include airflow/utils/python_virtualenv_script.jinja2
include airflow/utils/context.pyi
include generated
2 changes: 0 additions & 2 deletions STATIC_CODE_CHECKS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,6 @@ require Breeze Docker image to be build locally.
+--------------------------------------------------------+------------------------------------------------------------------+---------+
| check-airflow-config-yaml-consistent | Checks for consistency between config.yml and default_config.cfg | |
+--------------------------------------------------------+------------------------------------------------------------------+---------+
| check-airflow-providers-have-extras | Checks providers available when declared by extras in setup.py | |
+--------------------------------------------------------+------------------------------------------------------------------+---------+
| check-apache-license-rat | Check if licenses are OK for Apache | |
+--------------------------------------------------------+------------------------------------------------------------------+---------+
| check-base-operator-partial-arguments | Check BaseOperator and partial() arguments | |
Expand Down

0 comments on commit 0de31bd

Please sign in to comment.