Skip to content

Separate out creation of default Connections for tests and non-tests #52129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

amoghrajesh
Copy link
Contributor

@amoghrajesh amoghrajesh commented Jun 24, 2025

related: #51873

Why?

The create_default_connections is a well known utility we use to setup the default connections in Airflow, be it for the breeze setup, test setup or generally, maybe in connections command too.

While working on moving BaseHook to the task SDK, i happened to notice 100's if not 1000's of failures of the form:

__ ERROR at setup of TestBigQueryHookMethods.test_query_results[a,b-result2] ___
providers/google/tests/unit/google/cloud/hooks/test_bigquery.py:82: in setup_method
    self.hook = MockedBigQueryHook()
providers/google/src/airflow/providers/google/cloud/hooks/bigquery.py:168: in __init__
    super().__init__(**kwargs)
providers/google/src/airflow/providers/google/common/hooks/base_google.py:284: in __init__
    self.extras: dict = self.get_connection(self.gcp_conn_id).extra_dejson
task-sdk/src/airflow/sdk/bases/hook.py:61: in get_connection
    conn = Connection.get(conn_id)
task-sdk/src/airflow/sdk/definitions/connection.py:152: in get
    return _get_connection(conn_id)
task-sdk/src/airflow/sdk/execution_time/context.py:153: in _get_connection
    from airflow.sdk.execution_time.task_runner import SUPERVISOR_COMMS
E   ImportError: cannot import name 'SUPERVISOR_COMMS' from 'airflow.sdk.execution_time.task_runner' (/opt/airflow/task-sdk/src/airflow/sdk/execution_time/task_runner.py)

This is partly due to the fact that mock_supervisor_comms fixture isn't injected into them, and second being the fact that even if it is, I have to add mock_supervisor_comms.send.return_value = conn_result to each of those 1000 tests which is absolutely impractical.

So, if i could make the mock_supervisor_comms as autouse and have the tests utilitise the fact that Airflow does Env backend as default and create env's for those "default" connections, the tests would not only be faster because they perform env vs DB lookup (earlier) but also benefit from the fact that provider tests need not essentially use the DB at all. (or cannot in the longer run when we move them entirely to task SDK's execution flow)

Changes Made

  • Separated creation of connections in DB only to adhere the non-test use cases
  • Extracted out the creation of connections for tests to follow env vars instead
  • Used c.as_json() for environment variable values
  • Renamed clear_db_connections to clear_test_connections

Some more trivial examples here

  1. breeze start-airflow --backend postgres --executor CeleryExecutor --dev-mode --db-reset --load-default-connections still works

image

  1. airflow connections list still works and creates connections as needed
root@ab6aa532b249:/opt/airflow# airflow connections list
   |       |       |       |       |       |       |       |       |       | is_ex |       |
   |       |       | descr |       |       |       |       |       | is_en | tra_e | extra |
   | conn_ | conn_ | iptio |       | schem |       | passw |       | crypt | ncryp | _dejs |
id | id    | type  | n     | host  | a     | login | ord   | port  | ed    | ted   | on    | get_uri
===+=======+=======+=======+=======+=======+=======+=======+=======+=======+=======+=======+========
1  | airfl | mysql | None  | mysql | airfl | root  | None  | None  | False | False | {}    | mysql:/
   | ow_db |       |       |       | ow    |       |       |       |       |       |       | /root@m
   |       |       |       |       |       |       |       |       |       |       |       | ysql/ai
   |       |       |       |       |       |       |       |       |       |       |       | rflow
2  | athen | athen | None  | None  | None  | None  | None  | None  | False | False | {}    | athena:
   | a_def | a     |       |       |       |       |       |       |       |       |       | //
   | ault  |       |       |       |       |       |       |       |       |       |       |
3  | aws_d | aws   | None  | None  | None  | None  | None  | None  | False | False | {}    | aws://
   | efaul |       |       |       |       |       |       |       |       |       |       |
   | t     |       |       |       |       |       |       |       |       |       |       |
4  | azure | azure | None  | None  | None  | <ACCO | None  | None  | False | False | {'acc | azure-b
   | _batc | _batc |       |       |       | UNT_N |       |       |       |       | ount_ | atch://
   | h_def | h     |       |       |       | AME>  |       |       |       |       | url': | %3CACCO
   | ault  |       |       |       |       |       |       |       |       |       |       | UNT_NAM
   |       |       |       |       |       |       |       |       |       |       | '<ACC | E%3E@/?
   |       |       |       |       |       |       |       |       |       |       | OUNT_ | account
   |       |       |       |       |       |       |       |       |       |       | URL>' | _url=%3
   |       |       |       |       |       |       |       |       |       |       | }     | CACCOUN
   |       |       |       |       |       |       |       |       |       |       |       | T_URL%3
   |       |       |       |       |       |       |       |       |       |       |       | E
5  | azure | azure | None  | None  | None  | None  | None  | None  | False | False | {'dat | azure-c
   | _cosm | _cosm |       |       |       |       |       |       |       |       | abase | osmos:/
   | os_de | os    |       |       |       |       |       |       |       |       | _name | //?data
   | fault |       |       |       |       |       |       |       |       |       | ':    | base_na
   |       |       |       |       |       |       |       |       |       |       | '<DAT | me=%3CD
   |       |       |       |       |       |       |       |       |       |       | ABASE | ATABASE
   |       |       |       |       |       |       |       |       |       |       | _NAME | _NAME%3
   |       |       |       |       |       |       |       |       |       |       | >',   | E&colle
   |       |       |       |       |       |       |       |       |       |       | 'coll | ction_n
   |       |       |       |       |       |       |       |       |       |       | ectio | ame=%3C
   |       |       |       |       |       |       |       |       |       |       | n_nam | COLLECT
   |       |       |       |       |       |       |       |       |       |       | e':   | ION_NAM
   |       |       |       |       |       |       |       |       |       |       | '<COL | E%3E
   |       |       |       |       |       |       |       |       |       |       | LECTI |
   |       |       |       |       |       |       |       |       |       |       | ON_NA |
   |       |       |       |       |       |       |       |       |       |       | ME>'} 

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@amoghrajesh amoghrajesh requested review from ashb, potiuk and kaxil June 24, 2025 06:34
@amoghrajesh amoghrajesh self-assigned this Jun 24, 2025
@amoghrajesh amoghrajesh changed the title Extending create_default_connections to store in ENV vars too Separate out creation of default Connections for tests and non-tests Jun 24, 2025
@amoghrajesh amoghrajesh force-pushed the migrate-create-default-connections-to-do-env-too branch from 8ac2182 to 3ec02ac Compare June 24, 2025 07:41
@amoghrajesh amoghrajesh requested a review from kaxil June 24, 2025 09:38
@potiuk
Copy link
Member

potiuk commented Jun 24, 2025

This is fine, but, we should make the tests work without the DB and remove all "db_test" markers for them eventually,

I think this is possible.

If I understand correctly - you now create both DB and ENV vars, so it should be just a matter of running conditional code to "only" clear and set "env variables" when DB is not available in the common pytest plugin code. The common plugin code already works fine when there is no DB - but I think what it does it will skip creation of the default connections - but if you just change the common code to skip all DB parts but only set the env vars, then I belive none of the "get_connection" calls should even try to make a DB call and they should simply work when you set --backend none

@amoghrajesh
Copy link
Contributor Author

This is fine, but, we should make the tests work without the DB and remove all "db_test" markers for them eventually,

Yep! That is my goal too

If I understand correctly - you now create both DB and ENV vars, so it should be just a matter of running conditional code to "only" clear and set "env variables" when DB is not available in the common pytest plugin code. The common plugin code already works fine when there is no DB - but I think what it does it will skip creation of the default connections - but if you just change the common code to skip all DB parts but only set the env vars, then I belive none of the "get_connection" calls should even try to make a DB call and they should simply work when you set --backend none

@potiuk quite the opposite actually, i have extracted out logic to not create stuff on DB for tests. (Trying to get to complete coverage).

Right now, for tests, it creates only default connections and regular connections on ENV only.

That is precisely what you propose I believe.

@potiuk
Copy link
Member

potiuk commented Jun 25, 2025

Cool. I misunderstood the 'double' creation :) . The key is to run those tests with none backend - this way you fail with any db access :)

@amoghrajesh
Copy link
Contributor Author

Ah i see, I updated the PR desc a little later, you might have caught the older one!

@amoghrajesh
Copy link
Contributor Author

amoghrajesh commented Jun 25, 2025

Compat is not easy to fix!
🤞🏽 hopefully it works now!

@amoghrajesh
Copy link
Contributor Author

Alright! Finally green

@amoghrajesh amoghrajesh merged commit be822e0 into apache:main Jun 25, 2025
99 checks passed
@amoghrajesh amoghrajesh deleted the migrate-create-default-connections-to-do-env-too branch June 25, 2025 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants