Skip to content

Commit

Permalink
Run mypy checks for full packages in CI (#36638)
Browse files Browse the repository at this point in the history
MyPy as used in our static checks has slightly different heuristics
when running on on individual files and whole packages. This sometimes
causes semi-random failures when different set of files is produced
when pre-commits split the files between parallel processes.

The regular `mypy-*` pre-commits work by passing filenames to mypy
checks, and when `--all-files` flag is passed to mypy, this means
that 2700 files are passed. In this case pre-commit will split such
long list of files to several sequential muypy executions. This
is not very good because depending on the list of files passed,
mypy can split the list diferently and results will be different
when just list of files changes - so mypy might start detecting
problems that were not present before.

This PR introduces new `mypy` check that runs mypy for packages
rather than individual files. We cannot run them for local
pre-commit runs, because in many cases, such package based
mypy check will run for minutes when a single file changes,
due to cache invalidation rules - and we do not want to penalise
commits that are changing common airflow code (because such PRs
would invalidate a lot of mypy cache every time such common file
changes). So we still want to run file-based mypy for local
commits. But we do not want to pass 2700 files in CI, rather than
that on CI we want to run mypy checks "per package".

This PR introduces a new "manual" stage mypy pre-commit check that
will run "package" based mypy checks and adds selective check rules
that will decide properly when to run such tests and separate,
matrix-based CI job that will run such mypy checks - separately
for each of the packages: "airflow", "providers", "docs", "dev".

Also this job will skip providers checks in non-main branch and
will run all tests when "full tests needed" are requested.

This PR ignores some errors resulted from 3rd-party libraries used
that are randomply appearing when some files are modified (and fixes
the current main failures)
  • Loading branch information
potiuk committed Jan 7, 2024
1 parent 75bc05c commit f7b663d
Show file tree
Hide file tree
Showing 30 changed files with 443 additions and 130 deletions.
56 changes: 55 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ jobs:
ci-image-build: ${{ steps.selective-checks.outputs.ci-image-build }}
prod-image-build: ${{ steps.selective-checks.outputs.prod-image-build }}
docs-build: ${{ steps.selective-checks.outputs.docs-build }}
mypy-packages: ${{ steps.selective-checks.outputs.mypy-packages }}
needs-mypy: ${{ steps.selective-checks.outputs.needs-mypy }}
needs-helm-tests: ${{ steps.selective-checks.outputs.needs-helm-tests }}
needs-api-tests: ${{ steps.selective-checks.outputs.needs-api-tests }}
needs-api-codegen: ${{ steps.selective-checks.outputs.needs-api-codegen }}
Expand Down Expand Up @@ -510,7 +512,6 @@ jobs:
retention-days: 7
if-no-files-found: error


static-checks:
timeout-minutes: 45
name: "Static checks"
Expand Down Expand Up @@ -550,6 +551,57 @@ jobs:
DEFAULT_BRANCH: ${{ needs.build-info.outputs.default-branch }}
RUFF_FORMAT: "github"

# Runs static checks for groups of files in the repository in a single process without passing .
# List of files
mypy:
timeout-minutes: 45
name: "MyPy checks"
runs-on: ${{fromJSON(needs.build-info.outputs.runs-on)}}
needs: [build-info, wait-for-ci-images]
strategy:
fail-fast: false
matrix:
mypy-package: ${{fromJson(needs.build-info.outputs.mypy-packages)}}
env:
RUNS_ON: "${{needs.build-info.outputs.runs-on}}"
PYTHON_MAJOR_MINOR_VERSION: "${{needs.build-info.outputs.default-python-version}}"
UPGRADE_TO_NEWER_DEPENDENCIES: "${{ needs.build-info.outputs.upgrade-to-newer-dependencies }}"
steps:
- name: Cleanup repo
run: docker run -v "${GITHUB_WORKSPACE}:/workspace" -u 0:0 bash -c "rm -rf /workspace/*"
if: needs.build-info.outputs.needs-mypy == 'true'
- name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
uses: actions/checkout@v4
with:
persist-credentials: false
if: needs.build-info.outputs.needs-mypy == 'true'
- name: >
Prepare breeze & CI image: ${{needs.build-info.outputs.default-python-version}}:${{env.IMAGE_TAG}}
uses: ./.github/actions/prepare_breeze_and_image
id: breeze
if: needs.build-info.outputs.needs-mypy == 'true'
- name: Cache pre-commit envs
uses: actions/cache@v3
with:
path: ~/.cache/pre-commit
# yamllint disable-line rule:line-length
key: "pre-commit-${{steps.breeze.outputs.host-python-version}}-${{ hashFiles('.pre-commit-config.yaml') }}"
restore-keys: |
pre-commit-${{steps.breeze.outputs.host-python-version}}-
if: needs.build-info.outputs.needs-mypy == 'true'
- name: "MyPy checks for ${{ matrix.mypy-package }}"
run: |
pip install pre-commit
pre-commit run --color always --verbose --hook-stage manual mypy --all-files
env:
VERBOSE: "false"
COLUMNS: "250"
SKIP_GROUP_OUTPUT: "true"
DEFAULT_BRANCH: ${{ needs.build-info.outputs.default-branch }}
RUFF_FORMAT: "github"
MYPY_PACKAGES: ${{ matrix.mypy-package }}
if: needs.build-info.outputs.needs-mypy == 'true'

# Those checks are run if no image needs to be built for checks. This is for simple changes that
# Do not touch any of the python code or any of the important files that might require building
# The CI Docker image and they can be run entirely using the pre-commit virtual environments on host
Expand Down Expand Up @@ -2041,6 +2093,7 @@ jobs:
- wait-for-ci-images
- wait-for-prod-images
- static-checks
- mypy
- tests-sqlite
- tests-mysql
- tests-postgres
Expand Down Expand Up @@ -2195,6 +2248,7 @@ jobs:
- build-docs
- spellcheck-docs
- static-checks
- mypy
- tests-sqlite
- tests-mysql
- tests-postgres
Expand Down
13 changes: 11 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -736,7 +736,7 @@ repos:
# Keep dependency versions in sync w/ airflow/www/package.json
additional_dependencies: ['[email protected]', '[email protected]', '[email protected]']
- id: compile-www-assets
name: Compile www assets
name: Compile www assets (manual)
language: node
stages: ['manual']
'types_or': [javascript, ts, tsx]
Expand All @@ -745,7 +745,7 @@ repos:
pass_filenames: false
additional_dependencies: ['[email protected]']
- id: compile-www-assets-dev
name: Compile www assets in dev mode
name: Compile www assets in dev mode (manual)
language: node
stages: ['manual']
'types_or': [javascript, ts, tsx]
Expand Down Expand Up @@ -1105,6 +1105,15 @@ repos:
exclude: ^docs/rtd-deprecation
require_serial: true
additional_dependencies: ['rich>=12.4.4']
- id: mypy
stages: ['manual']
name: Run mypy for specified packages (manual)
language: python
entry: ./scripts/ci/pre_commit/pre_commit_mypy.py
pass_filenames: false
files: ^.*\.py$
require_serial: true
additional_dependencies: ['rich>=12.4.4']
- id: check-provider-yaml-valid
name: Validate provider.yaml files
entry: ./scripts/ci/pre_commit/pre_commit_check_provider_yaml_files.py
Expand Down
32 changes: 30 additions & 2 deletions STATIC_CODE_CHECKS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,13 @@ Available pre-commit checks
This table lists pre-commit hooks used by Airflow. The ``Image`` column indicates which hooks
require Breeze Docker image to be built locally.

.. note:: Manual pre-commits

Most of the checks we run are configured to run automatically when you commit the code. However,
there are some checks that are not run automatically and you need to run them manually. Those
checks are marked with ``manual`` in the ``Description`` column in the table below. You can run
them manually by running ``pre-commit run --hook-stage manual <hook-id>``.

.. note:: Disabling particular checks

In case you have a problem with running particular ``pre-commit`` check you can still continue using the
Expand All @@ -127,6 +134,24 @@ require Breeze Docker image to be built locally.
the image by setting ``SKIP_BREEZE_PRE_COMMITS`` to "true". This will mark the tests as "green" automatically
when run locally (note that those checks will anyway run in CI).

.. note:: Mypy checks

When we run mypy checks locally when committing a change, one of the ``mypy-*`` checks is run, ``mypy-core``,
``mypy-dev``, ``mypy-providers``, ``mypy-docs``, depending on the files you are changing. The mypy checks
are run by passing those changed files to mypy. This is way faster than running checks for all files (even
if mypy cache is used - especially when you change a file in airflow core that is imported and used by many
files). However, in some cases, it produces different results than when running checks for the whole set
of files, because ``mypy`` does not even know that some types are defined in other files and it might not
be able to follow imports properly if they are dynamic. Therefore in CI we run ``mypy`` check for whole
directories (``airflow`` - excluding providers, ``airflow/providers``, ``dev`` and ``docs``) to make sure
that we catch all ``mypy`` errors - so you can experience different results when running mypy locally and
in CI. If you want to run mypy checks for all files locally, you can do it by running the following
command (example for ``airflow`` files):

.. code-block:: bash
MYPY_PACKAGES="airflow" pre-commit run --hook-stage manual mypy --all-files
.. note:: Mypy volume cache

MyPy uses a separate docker-volume (called ``mypy-cache-volume``) that keeps the cache of last MyPy
Expand All @@ -135,6 +160,7 @@ require Breeze Docker image to be built locally.
is the hard problem in computer science). This might happen for example when we upgrade MyPY. In such
cases you might need to manually remove the cache volume by running ``breeze down --cleanup-mypy-cache``.


.. BEGIN AUTO-GENERATED STATIC CHECK LIST
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
Expand Down Expand Up @@ -258,9 +284,9 @@ require Breeze Docker image to be built locally.
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| codespell | Run codespell to check for common misspellings in files | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| compile-www-assets | Compile www assets | |
| compile-www-assets | Compile www assets (manual) | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| compile-www-assets-dev | Compile www assets in dev mode | |
| compile-www-assets-dev | Compile www assets in dev mode (manual) | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| create-missing-init-py-files-tests | Create missing init.py files in tests | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
Expand Down Expand Up @@ -316,6 +342,8 @@ require Breeze Docker image to be built locally.
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| mixed-line-ending | Detect if mixed line ending is used (\r vs. \r\n) | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| mypy | Run mypy for specified packages (manual) | * |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| mypy-core | Run mypy for core | * |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| mypy-dev | Run mypy for dev | * |
Expand Down
20 changes: 11 additions & 9 deletions airflow/decorators/__init__.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ __all__ = [

class TaskDecoratorCollection:
@overload
def python(
def python( # type: ignore[misc]
self,
*,
multiple_outputs: bool | None = None,
Expand Down Expand Up @@ -90,7 +90,7 @@ class TaskDecoratorCollection:
def python(self, python_callable: Callable[FParams, FReturn]) -> Task[FParams, FReturn]: ...
# [END mixin_for_typing]
@overload
def __call__(
def __call__( # type: ignore[misc]
self,
*,
multiple_outputs: bool | None = None,
Expand All @@ -103,7 +103,7 @@ class TaskDecoratorCollection:
def __call__(self, python_callable: Callable[FParams, FReturn]) -> Task[FParams, FReturn]:
"""Aliasing ``python``; signature should match exactly."""
@overload
def virtualenv(
def virtualenv( # type: ignore[misc]
self,
*,
multiple_outputs: bool | None = None,
Expand Down Expand Up @@ -189,7 +189,9 @@ class TaskDecoratorCollection:
such as transmission a large amount of XCom to TaskAPI.
"""
@overload
def branch(self, *, multiple_outputs: bool | None = None, **kwargs) -> TaskDecorator:
def branch( # type: ignore[misc]
self, *, multiple_outputs: bool | None = None, **kwargs
) -> TaskDecorator:
"""Create a decorator to wrap the decorated callable into a BranchPythonOperator.
For more information on how to use this decorator, see :ref:`concepts:branching`.
Expand All @@ -201,7 +203,7 @@ class TaskDecoratorCollection:
@overload
def branch(self, python_callable: Callable[FParams, FReturn]) -> Task[FParams, FReturn]: ...
@overload
def branch_virtualenv(
def branch_virtualenv( # type: ignore[misc]
self,
*,
multiple_outputs: bool | None = None,
Expand Down Expand Up @@ -294,7 +296,7 @@ class TaskDecoratorCollection:
self, python_callable: Callable[FParams, FReturn]
) -> Task[FParams, FReturn]: ...
@overload
def short_circuit(
def short_circuit( # type: ignore[misc]
self,
*,
multiple_outputs: bool | None = None,
Expand Down Expand Up @@ -629,7 +631,7 @@ class TaskDecoratorCollection:
:param progress_callback: Callback function for receiving k8s container logs.
"""
@overload
def sensor(
def sensor( # type: ignore[misc]
self,
*,
poke_interval: float = ...,
Expand Down Expand Up @@ -666,7 +668,7 @@ class TaskDecoratorCollection:
@overload
def sensor(self, python_callable: Callable[FParams, FReturn] | None = None) -> Task[FParams, FReturn]: ...
@overload
def pyspark(
def pyspark( # type: ignore[misc]
self,
*,
multiple_outputs: bool | None = None,
Expand All @@ -688,7 +690,7 @@ class TaskDecoratorCollection:
self, python_callable: Callable[FParams, FReturn] | None = None
) -> Task[FParams, FReturn]: ...
@overload
def bash(
def bash( # type: ignore[misc]
self,
*,
env: dict[str, str] | None = None,
Expand Down
3 changes: 2 additions & 1 deletion airflow/models/taskreschedule.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@

from airflow.models.operator import Operator
from airflow.models.taskinstance import TaskInstance
from airflow.serialization.pydantic.taskinstance import TaskInstancePydantic


class TaskReschedule(TaskInstanceDependencies):
Expand Down Expand Up @@ -103,7 +104,7 @@ def __init__(
@classmethod
def stmt_for_task_instance(
cls,
ti: TaskInstance,
ti: TaskInstance | TaskInstancePydantic,
*,
try_number: int | None = None,
descending: bool = False,
Expand Down
2 changes: 1 addition & 1 deletion airflow/operators/latest_only.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ class LatestOnlyOperator(BaseBranchOperator):
def choose_branch(self, context: Context) -> str | Iterable[str]:
# If the DAG Run is externally triggered, then return without
# skipping downstream tasks
dag_run: DagRun = context["dag_run"]
dag_run: DagRun = context["dag_run"] # type: ignore[assignment]
if dag_run.external_trigger:
self.log.info("Externally triggered DAG_Run: allowing execution to proceed.")
return list(context["task"].get_direct_relative_ids(upstream=False))
Expand Down
4 changes: 2 additions & 2 deletions airflow/providers/apache/druid/hooks/druid.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ def get_uri(self) -> str:
endpoint = conn.extra_dejson.get("endpoint", "druid/v2/sql")
return f"{conn_type}://{host}/{endpoint}"

def set_autocommit(self, conn: connect, autocommit: bool) -> NotImplementedError:
def set_autocommit(self, conn: connect, autocommit: bool) -> None:
raise NotImplementedError()

def insert_rows(
Expand All @@ -211,5 +211,5 @@ def insert_rows(
commit_every: int = 1000,
replace: bool = False,
**kwargs: Any,
) -> NotImplementedError:
) -> None:
raise NotImplementedError()
4 changes: 2 additions & 2 deletions airflow/providers/databricks/hooks/databricks_sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ def get_conn(self) -> Connection:
)
return self._sql_conn

@overload
@overload # type: ignore[override]
def run(
self,
sql: str | Iterable[str],
Expand Down Expand Up @@ -249,7 +249,7 @@ def run(
self.set_autocommit(conn, autocommit)

with closing(conn.cursor()) as cur:
self._run_command(cur, sql_statement, parameters)
self._run_command(cur, sql_statement, parameters) # type: ignore[attr-defined]
if handler is not None:
raw_result = handler(cur)
if self.return_tuple:
Expand Down
6 changes: 4 additions & 2 deletions airflow/providers/exasol/hooks/exasol.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ def get_description(statement: ExaStatement) -> Sequence[Sequence]:
)
return cols

@overload
@overload # type: ignore[override]
def run(
self,
sql: str | Iterable[str],
Expand Down Expand Up @@ -232,7 +232,9 @@ def run(
with closing(conn.execute(sql_statement, parameters)) as exa_statement:
self.log.info("Running statement: %s, parameters: %s", sql_statement, parameters)
if handler is not None:
result = self._make_common_data_structure(handler(exa_statement))
result = self._make_common_data_structure( # type: ignore[attr-defined]
handler(exa_statement)
)
if return_single_query_results(sql, return_last, split_statements):
_last_result = result
_last_columns = self.get_description(exa_statement)
Expand Down
2 changes: 1 addition & 1 deletion airflow/providers/google/cloud/hooks/cloud_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
RunJobRequest,
UpdateJobRequest,
)
from google.longrunning import operations_pb2
from google.longrunning import operations_pb2 # type: ignore[attr-defined]

from airflow.exceptions import AirflowException
from airflow.providers.google.common.consts import CLIENT_INFO
Expand Down
2 changes: 1 addition & 1 deletion airflow/providers/google/cloud/operators/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -449,7 +449,7 @@ def execute(self, context: Context) -> None: # type: ignore[override]
# job.result() returns a RowIterator. Mypy expects an instance of SupportsNext[Any] for
# the next() call which the RowIterator does not resemble to. Hence, ignore the arg-type error.
records = next(job.result()) # type: ignore[arg-type]
self.check_value(records)
self.check_value(records) # type: ignore[attr-defined]
self.log.info("Current state of job %s is %s", job.job_id, job.state)

@staticmethod
Expand Down
2 changes: 1 addition & 1 deletion airflow/providers/google/cloud/triggers/cloud_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from airflow.triggers.base import BaseTrigger, TriggerEvent

if TYPE_CHECKING:
from google.longrunning import operations_pb2
from google.longrunning import operations_pb2 # type: ignore[attr-defined]

DEFAULT_BATCH_LOCATION = "us-central1"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
import google.auth
import google.auth.credentials
import google.oauth2.service_account
from google.auth import impersonated_credentials
from google.auth import impersonated_credentials # type: ignore[attr-defined]
from google.auth.environment_vars import CREDENTIALS, LEGACY_PROJECT, PROJECT

from airflow.exceptions import AirflowException
Expand Down

0 comments on commit f7b663d

Please sign in to comment.