Skip to content

Commit

Permalink
Standardize airflow build process and switch to Hatchling build backe…
Browse files Browse the repository at this point in the history
…nd (#36537)

This PR changes Airflow installation and build backend to use new
standard Python ways of building Python applications.

We've been trying to do it for quite a while. Airflow tranditionally
has been using complex and convoluted build process based on
setuptools and (extremely) custom setup.py file. It survived
migration to Airflow 2.0 and splitting Airlfow monorepo into
Airflow and Providers, adding pre-installed providers and switching
providers to use flit (and follow build standards).

So far tooling in Python ecosystme had not been able to fuflill our
needs and we refrained to develop our own tooling, but finally with
appearance of Hatch (managed by Python Packaging Authority) and
few recent advancements there we are finally able to swtich to
Python standard ways of managing project dependnecy configuration
and project build setup (with a few customizations).

This PR makes airflow build process follow those standard PEPs:

* Airflow has all build configuration stored in pyproject.toml
  following PEP 518 which allows any fronted (`pip`, `poetry`,
  `hatch`, `flit`, or whatever other frontend is used to
  install required build dependendencies to install Airflow
  locally and to build distribution pacakges (sdist/wheel)

* Hatchling backend follows PEP 517 for standard source tree and build
  backend implementation that allows to execute the build in a
  frontend-independent way

* We store all project metadata in pyprooject.toml - following
  PEP 621 where all necessary project metadata components were
  defined.

* We plug-in into Hatchling "editable build" hooks following
  PEP 660. Hatchling internally builds editable wheel that
  is used as ephemeral step and communication between backend
  and frontend (and this ephemeral wheel is used to make
  editable installation of the projeect - suitable for fast
  iteration of code without reinstalling the package.

With Airflow having many provider packages in single source tree
where we want to be able to install and develop airflow and
providers together, this is not a small feat to implement the
case wher editable installation has to behave quite a bit
differently when it comes to packaging and dependencies for
editable install (when you want to edit sources directly) and
installable package (where you want to have separate Airflow
package and provider packages). Fortunately the standardisation
efforts in the Python Packaging community and tooling implementing
it had finally made it possible.

Some of the important ways bow this has been achieved:

* We continue using provider.yaml in providers as the single source
  of trutgh for per-provider dependencies. We added a possibility
  to specify "devel-dependencies" in provider.yaml so that all
  per-provider dependencies in `generated/provider_dependencies.json`
  and `pyproject.toml` are generated from those dependencies via
  update-providers-dependencies pre-commit.

* Pyproject.toml is generally managed manually, but the part where
  provider dependencies and bundle dependencies are used is
  automatically updated by a pre-commit whenever provider
  dependencies change. Those generated provider dependencies contain
  just dependencies of providers - not the provider packages, but
  in the final "standard" wheel file they are replaced with
  "apache-airflow-providers-PROVIDER" dependencies - so that the
  wheel package will only install the provider and use the
  dependencies of that version of provider it installs.

* We are utilising custom hatchiling build hooks (PEP 660 standard)
  that allow to modify 'standard' wheel package on-the-fly when
  the wheel is being prepared by adding preinstalled package
  dependencies (which are not needed in editable build) and by
  removing all devel extras (that are not needed in the PyPI
  distributed wheel package). This allows to solve the conundrum
  of having different "editable" and "standard" behaviour while
  keeping the same project specification in pyproject.toml.

* We added description of how `Hatch` can be employed as build
  frontend in order to manage local virtualenv and install Airflow
  in editable way easily - while keeping all properties of the
  installed application (including working airflow cli and
  package metadata discovery) as well as how to use PEP-standard
  ways of bulding wheel and sdist packages.

* We have a custom step (following PEP-standards) to inject
  airflow-specific build steps - compiling www assets and
  generating git commit hash version to display it in the UI

* We also show how all this makes it possible to make it easy to
  manage local virtualenvs and editable installations for Airflow
  contributors - without vendor lock-in of the build tools as
  by following standard PEPs Airflow can be locally and editably
  installed by anyone using any build front-end tools following
  the standards - whether you use `pip`, `poetry`, `Hatch`, `flit`
  or any other frontent build tools, Airflow local installation
  and package building will work the same way for all of them,
  where both "editable" and "standard" package prepration is
  managed by `hatchling` backend in the same way.

* Previously our extras contained a "." which is not normalized
  name for extras - `pip` and other tools replaced it automatically
  with `_'. This change updates the extra names to contain
  '-' rather than '.' in the name, following PEP-685.  This should be
  fully backwards compatible, users will still be able to use "." but it
  will be normalized to "-" in Airflow packages. This is also future
  proof as it is expected that all package managers and tools
  will eventually use PEP-685 applied to extras, even if currently
  some of the tools (pip + setuptools) might generate warnings.

* Additionally, this change organizes the documentation around
  the extras and dependencies, explaining the reasoning behind
  all the different extras we have.

* As a bonus (and this is what we used to test it all) we are
  documenting how to use Hatch frontend to:

  * manage multiple Python installations
  * manage multiple Pythob virtualenv environments
  * build Airflow packages for release management
  • Loading branch information
potiuk committed Jan 10, 2024
1 parent ead7528 commit c439ab8
Show file tree
Hide file tree
Showing 146 changed files with 3,650 additions and 3,045 deletions.
3 changes: 0 additions & 3 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@
!.dockerignore
!RELEASE_NOTES.rst
!LICENSE
!MANIFEST.in
!NOTICE
!.github
!empty
Expand All @@ -68,8 +67,6 @@
!.bash_completion.d

# Setup/version configuration
!setup.cfg
!setup.py
!pyproject.toml
!manifests
!generated
Expand Down
2 changes: 1 addition & 1 deletion .github/actions/build-prod-images/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ runs:
shell: bash
run: >
breeze release-management prepare-provider-packages
--package-list-file ./airflow/providers/installed_providers.txt
--package-list-file ./dev/prod_image_installed_providers.txt
--package-format wheel --version-suffix-for-pypi dev0
if: ${{ inputs.build-provider-packages == 'true' }}
- name: "Prepare chicken-eggs provider packages"
Expand Down
39 changes: 16 additions & 23 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ jobs:

# Push early BuildX cache to GitHub Registry in Apache repository, This cache does not wait for all the
# tests to complete - it is run very early in the build process for "main" merges in order to refresh
# cache using the current constraints. This will speed up cache refresh in cases when setup.py
# cache using the current constraints. This will speed up cache refresh in cases when pyproject.toml
# changes or in case of Dockerfile changes. Failure in this step is not a problem (at most it will
# delay cache refresh. It does not attempt to upgrade to newer dependencies.
# We only push CI cache as PROD cache usually does not gain as much from fresh cache because
Expand Down Expand Up @@ -486,7 +486,7 @@ jobs:
# And when we prepare them from sources they will have apache-airflow>=X.Y.Z.dev0
shell: bash
run: >
breeze release-management prepare-provider-packages
breeze release-management prepare-provider-packages --include-not-ready-providers
--package-format wheel --version-suffix-for-pypi dev0
${{ needs.build-info.outputs.chicken-egg-providers }}
if: needs.build-info.outputs.chicken-egg-providers != ''
Expand Down Expand Up @@ -678,9 +678,9 @@ jobs:
id: cache-doc-inventories
with:
path: ./docs/_inventory_cache/
key: docs-inventory-${{ hashFiles('setup.py','setup.cfg','pyproject.toml;') }}
key: docs-inventory-${{ hashFiles('pyproject.toml;') }}
restore-keys: |
docs-inventory-${{ hashFiles('setup.py','setup.cfg','pyproject.toml;') }}
docs-inventory-${{ hashFiles('pyproject.toml;') }}
docs-inventory-
- name: "Build docs"
run: >
Expand Down Expand Up @@ -742,9 +742,9 @@ jobs:
id: cache-doc-inventories
with:
path: ./docs/_inventory_cache/
key: docs-inventory-${{ hashFiles('setup.py','setup.cfg','pyproject.toml;') }}
key: docs-inventory-${{ hashFiles('pyproject.toml;') }}
restore-keys: |
docs-inventory-${{ hashFiles('setup.py','setup.cfg','pyproject.toml;') }}
docs-inventory-${{ hashFiles('pyproject.toml;') }}
docs-inventory-
- name: "Spellcheck docs"
run: >
Expand Down Expand Up @@ -773,11 +773,13 @@ jobs:
run: rm -fv ./dist/*
- name: "Prepare provider documentation"
run: >
breeze release-management prepare-provider-documentation --non-interactive
breeze release-management prepare-provider-documentation --include-not-ready-providers
--non-interactive
${{ needs.build-info.outputs.affected-providers-list-as-string }}
- name: "Prepare provider packages: wheel"
run: >
breeze release-management prepare-provider-packages --version-suffix-for-pypi dev0
breeze release-management prepare-provider-packages --include-not-ready-providers
--version-suffix-for-pypi dev0
--package-format wheel ${{ needs.build-info.outputs.affected-providers-list-as-string }}
- name: "Prepare airflow package: wheel"
run: breeze release-management prepare-airflow-package --version-suffix-for-pypi dev0
Expand Down Expand Up @@ -846,7 +848,7 @@ jobs:
run: rm -fv ./dist/*
- name: "Prepare provider packages: sdist"
run: >
breeze release-management prepare-provider-packages
breeze release-management prepare-provider-packages --include-not-ready-providers
--version-suffix-for-pypi dev0 --package-format sdist
${{ needs.build-info.outputs.affected-providers-list-as-string }}
- name: "Prepare airflow package: sdist"
Expand Down Expand Up @@ -913,7 +915,7 @@ jobs:
run: rm -fv ./dist/*
- name: "Prepare provider packages: wheel"
run: >
breeze release-management prepare-provider-packages
breeze release-management prepare-provider-packages --include-not-ready-providers
--package-format wheel ${{ needs.build-info.outputs.affected-providers-list-as-string }}
- name: >
Remove incompatible Airflow
Expand All @@ -922,17 +924,9 @@ jobs:
rm -vf ${{ matrix.remove-providers }}
working-directory: ./dist
if: matrix.remove-providers != ''
- name: "Checkout ${{matrix.airflow-version}} of Airflow"
uses: actions/checkout@v4
with:
persist-credentials: false
ref: ${{matrix.airflow-version}}
path: old-airflow
- name: "Prepare airflow package: wheel"
- name: "Download airflow package: wheel"
run: |
pip install pip==23.3.2 wheel==0.36.2 gitpython==3.1.40
python setup.py egg_info --tag-build ".dev0" bdist_wheel -d ../dist
working-directory: ./old-airflow
pip download "apache-airflow==${{matrix.airflow-version}}" -d dist --no-deps
- name: >
Install and verify all provider packages and airflow on
Airflow ${{matrix.airflow-version}}:Python ${{matrix.python-version}}
Expand Down Expand Up @@ -2050,8 +2044,7 @@ jobs:
path: ".build/.k8s-env"
key: "\
k8s-env-${{steps.breeze.outputs.host-python-version}}-\
${{ hashFiles('scripts/ci/kubernetes/k8s_requirements.txt','setup.cfg',\
'setup.py','pyproject.toml','generated/provider_dependencies.json') }}"
${{ hashFiles('scripts/ci/kubernetes/k8s_requirements.txt','pyproject.toml') }}"
- name: Run complete K8S tests ${{needs.build-info.outputs.kubernetes-combos-list-as-string}}
run: breeze k8s run-complete-tests --run-in-parallel --upgrade
env:
Expand Down Expand Up @@ -2182,7 +2175,7 @@ jobs:
- name: "Prepare providers packages for PROD build"
run: >
breeze release-management prepare-provider-packages
--package-list-file ./airflow/providers/installed_providers.txt
--package-list-file ./dev/prod_image_installed_providers.txt
--package-format wheel
env:
VERSION_SUFFIX_FOR_PYPI: "dev0"
Expand Down
7 changes: 4 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -190,8 +190,6 @@ dmypy.json
log.txt*

# Provider-related ignores
/provider_packages/CHANGELOG.txt
/provider_packages/MANIFEST.in
/airflow/providers/__init__.py

# Docker context files
Expand Down Expand Up @@ -219,7 +217,7 @@ pip-wheel-metadata
/dev/Dockerfile.pmc

# Generated UI licenses
licenses/LICENSES-ui.txt
3rd-party-licenses/LICENSES-ui.txt

# Packaged breeze on Windows
/breeze.exe
Expand All @@ -240,3 +238,6 @@ licenses/LICENSES-ui.txt

# Dask Executor tests generate this directory
/tests/executors/dask-worker-space/

# airflow-build-dockerfile and correconding ignore file
airflow-build-dockerfile*
45 changes: 20 additions & 25 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -274,11 +274,6 @@ repos:
- --ignore-words=docs/spelling_wordlist.txt
- --skip=airflow/providers/*/*.rst,airflow/www/*.log,docs/*/commits.rst,docs/apache-airflow/tutorial/pipeline_example.csv,*.min.js,*.lock,INTHEWILD.md
- --exclude-file=.codespellignorelines
- repo: https://github.com/abravalheri/validate-pyproject
rev: v0.15
hooks:
- id: validate-pyproject
name: Validate pyproject.toml
- repo: local
# Note that this is the 2nd "local" repo group in the .pre-commit-config.yaml file. This is because
# we try to minimise the number of passes that must happen in order to apply some of the changes
Expand Down Expand Up @@ -333,13 +328,6 @@ repos:
files: Dockerfile.*$
pass_filenames: true
require_serial: true
- id: check-setup-order
name: Check order of dependencies in setup.cfg and setup.py
language: python
files: ^setup\.cfg$|^setup\.py$
pass_filenames: false
entry: ./scripts/ci/pre_commit/pre_commit_check_order_setup.py
additional_dependencies: ['rich>=12.4.4']
- id: check-airflow-k8s-not-used
name: Check airflow.kubernetes imports are not used
language: python
Expand All @@ -363,14 +351,6 @@ repos:
exclude: ^airflow/kubernetes/|^airflow/providers/
entry: ./scripts/ci/pre_commit/pre_commit_check_cncf_k8s_used_for_k8s_executor_only.py
additional_dependencies: ['rich>=12.4.4']
- id: check-extra-packages-references
name: Checks setup extra packages
description: Checks if all the libraries in setup.py are listed in extra-packages-ref.rst file
language: python
files: ^setup\.py$|^docs/apache-airflow/extra-packages-ref\.rst$|^airflow/providers/.*/provider\.yaml$
pass_filenames: false
entry: ./scripts/ci/pre_commit/pre_commit_check_setup_extra_packages_ref.py
additional_dependencies: ['rich>=12.4.4']
- id: check-airflow-provider-compatibility
name: Check compatibility of Providers with Airflow
entry: ./scripts/ci/pre_commit/pre_commit_check_provider_airflow_compatibility.py
Expand Down Expand Up @@ -400,19 +380,34 @@ repos:
files: ^airflow/providers/.*/hooks/.*\.py$
additional_dependencies: ['rich>=12.4.4', 'pyyaml', 'packaging']
- id: update-providers-dependencies
name: Update cross-dependencies for providers packages
name: Update dependencies for provider packages
entry: ./scripts/ci/pre_commit/pre_commit_update_providers_dependencies.py
language: python
files: ^airflow/providers/.*\.py$|^airflow/providers/.*/provider\.yaml$|^tests/providers/.*\.py$|^tests/system/providers/.*\.py$
files: ^airflow/providers/.*\.py$|^airflow/providers/.*/provider\.yaml$|^tests/providers/.*\.py$|^tests/system/providers/.*\.py$|^scripts/ci/pre_commit/pre_commit_update_providers_dependencies\.py$
pass_filenames: false
additional_dependencies: ['setuptools', 'rich>=12.4.4', 'pyyaml']
additional_dependencies: ['setuptools', 'rich>=12.4.4', 'pyyaml', 'tomli']
- id: check-extra-packages-references
name: Checks setup extra packages
description: Checks if all the extras defined in pyproject.toml are listed in extra-packages-ref.rst file
language: python
files: ^docs/apache-airflow/extra-packages-ref\.rst$|^pyproject.toml
pass_filenames: false
entry: ./scripts/ci/pre_commit/pre_commit_check_extra_packages_ref.py
additional_dependencies: ['rich>=12.4.4', 'tomli', 'tabulate']
- id: check-pyproject-toml-order
name: Check order of dependencies in pyproject.toml
language: python
files: ^pyproject\.toml$
pass_filenames: false
entry: ./scripts/ci/pre_commit/pre_commit_check_order_pyproject_toml.py
additional_dependencies: ['rich>=12.4.4']
- id: update-extras
name: Update extras in documentation
entry: ./scripts/ci/pre_commit/pre_commit_insert_extras.py
language: python
files: ^setup\.py$|^CONTRIBUTING\.rst$|^INSTALL$|^airflow/providers/.*/provider\.yaml$
pass_filenames: false
additional_dependencies: ['rich>=12.4.4']
additional_dependencies: ['rich>=12.4.4', 'tomli']
- id: check-extras-order
name: Check order of extras in Dockerfile
entry: ./scripts/ci/pre_commit/pre_commit_check_order_dockerfile_extras.py
Expand Down Expand Up @@ -712,7 +707,7 @@ repos:
name: Sort alphabetically and uniquify installed_providers.txt
entry: ./scripts/ci/pre_commit/pre_commit_sort_installed_providers.py
language: python
files: ^\.pre-commit-config\.yaml$|^airflow/providers/installed_providers\.txt$
files: ^\.pre-commit-config\.yaml$|^dev/.*_installed_providers\.txt$
pass_filenames: false
require_serial: true
- id: update-spelling-wordlist-to-be-sorted
Expand Down
5 changes: 3 additions & 2 deletions .rat-excludes
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ venv
files
airflow.iml
.gitmodules
installed_providers.txt
prod_image_installed_providers.txt
airflow_pre_installed_providers.txt

# Generated doc files
.*html
Expand All @@ -61,7 +62,7 @@ spelling_wordlist.txt
# it is compatible according to http://www.apache.org/legal/resolved.html#category-a
kerberos_auth.py
airflow_api_auth_backend_kerberos_auth_py.html
licenses/*
3rd-party-licenses/*
parallel.js
underscore.js
jquery.dataTables.min.js
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
22 changes: 11 additions & 11 deletions BREEZE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1569,7 +1569,7 @@ The CI image is built automatically as needed, however it can be rebuilt manuall
Building the image first time pulls a pre-built version of images from the Docker Hub, which may take some
time. But for subsequent source code changes, no wait time is expected.
However, changes to sensitive files like ``setup.py`` or ``Dockerfile.ci`` will trigger a rebuild
However, changes to sensitive files like ``pyproject.toml`` or ``Dockerfile.ci`` will trigger a rebuild
that may take more time though it is highly optimized to only rebuild what is needed.
Breeze has built in mechanism to check if your local image has not diverged too much from the
Expand Down Expand Up @@ -2299,7 +2299,7 @@ These are all available flags of ``release-management add-back-references`` comm
Generating constraints
""""""""""""""""""""""
Whenever setup.py gets modified, the CI main job will re-generate constraint files. Those constraint
Whenever ``pyproject.toml`` gets modified, the CI main job will re-generate constraint files. Those constraint
files are stored in separated orphan branches: ``constraints-main``, ``constraints-2-0``.
Those are constraint files as described in detail in the
Expand Down Expand Up @@ -2341,14 +2341,14 @@ These are all available flags of ``generate-constraints`` command:
:width: 100%
:alt: Breeze generate-constraints
In case someone modifies setup.py, the scheduled CI Tests automatically upgrades and
In case someone modifies ``pyproject.toml``, the scheduled CI Tests automatically upgrades and
pushes changes to the constraint files, however you can also perform test run of this locally using
the procedure described in the
`Manually generating image cache and constraints <dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md>`_
which utilises multiple processors on your local machine to generate such constraints faster.
This bumps the constraint files to latest versions and stores hash of setup.py. The generated constraint
and setup.py hash files are stored in the ``files`` folder and while generating the constraints diff
This bumps the constraint files to latest versions and stores hash of ``pyproject.toml``. The generated constraint
and ``pyproject.toml`` hash files are stored in the ``files`` folder and while generating the constraints diff
of changes vs the previous constraint files is printed.
Updating constraints
Expand Down Expand Up @@ -2697,18 +2697,18 @@ disappear when you exit Breeze shell.
When you want to add dependencies permanently, then it depends what kind of dependency you add.
If you want to add core dependency that should always be installed - you need to add it to ``setup.cfg``
to ``install_requires`` section. If you want to add it to one of the optional core extras, you should
add it in the extra definition in ``setup.py`` (you need to find out where it is defined). If you want
to add it to one of the providers, you need to add it to the ``provider.yaml`` file in the provider
If you want to add core dependency that should always be installed - you need to add it to ``pyproject.toml``
to ``dependencies`` section. If you want to add it to one of the optional core extras, you should
add it in the extra definition in ``pyproject.toml`` (you need to find out where it is defined).
If you want to add it to one of the providers, you need to add it to the ``provider.yaml`` file in the provider
directory - but remember that this should be followed by running pre-commit that will automatically update
the ``generated/provider_dependencies.json`` directory with the new dependencies:
the ``pyproject.toml`` with the new dependencies as the ``provider.yaml`` files are not used directly, they
are used to update ``pyproject.toml`` file:
.. code-block:: bash
pre-commit run update-providers-dependencies --all-files
You can also run the pre-commit by ``breeze static-checks --type update-providers-dependencies --all-files``
command - which provides autocomplete.
Expand Down
4 changes: 2 additions & 2 deletions CI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,7 @@ those via corresponding command line flags passed to ``breeze shell`` command.
| ``UPGRADE_TO_NEWER_DEPENDENCIES`` | false | false | false\* | Determines whether the build should |
| | | | | attempt to upgrade Python base image and all |
| | | | | PIP dependencies to latest ones matching |
| | | | | ``setup.py`` limits. This tries to replicate |
| | | | | ``pyproject.toml`` limits. Tries to replicate |
| | | | | the situation of "fresh" user who just installs |
| | | | | airflow and uses latest version of matching |
| | | | | dependencies. By default we are using a |
Expand All @@ -638,7 +638,7 @@ those via corresponding command line flags passed to ``breeze shell`` command.
| | | | | |
| | | | | Setting the value to random value is best way |
| | | | | to assure that constraints are upgraded even if |
| | | | | there is no change to setup.py |
| | | | | there is no change to ``pyproject.toml`` |
| | | | | |
| | | | | This way our constraints are automatically |
| | | | | tested and updated whenever new versions |
Expand Down

0 comments on commit c439ab8

Please sign in to comment.