Skip to content

Introducing fixture to create Connections without DB in provider tests #51930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 22, 2025

Conversation

amoghrajesh
Copy link
Contributor

@amoghrajesh amoghrajesh commented Jun 19, 2025

Why?

While working on moving basehook to task sdk (#51873), I realized that most providers set up their connections using db.merge_conn() with thousands of occurrences across the codebase.

db.merge_conn() is a utility that creates connections in the database for provider tests:

  • Overall slows down test execution due to database transactions
  • Can cause test isolation issues when connections persist between tests
  • Require database setup/teardown, making tests more complex
  • Can lead to flaky tests due to DB access issues

My initial idea was to create a fixture that could intercept, find, and replace those calls with task SDK comms calls automatically. However, I decided against this approach because it didn't add much value and was not maintainable.

How?

Created a new pytest fixture that leveraged the fact that Airflow has a Env backend setup by default with precedence being higher than that of the database: https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#storing-connections-in-environment-variables.

The create_connection_without_db fixture sets AIRFLOW_CONN_{conn_id.upper()} to conn.get_uri() which can be utilised during lookup.

Benefits:

  • Connections are resolved from environment variables first
  • No database state is required for connection setup
  • Tests are more isolated and predictable

Effort involved:

There were various kinds of occurrences of db.merge_conn:

  1. In unit tests:
    • setup_class
    • setup_method
    • setup_module
    • In local functions
    • And some arbitrary ones
  2. In system tests
  3. In example dags!

TODO

  • Handle occurrences in unit tests
  • Have some code level TODOs, need to be checked in follow up
  • Handle occurrences In system tests
  • Consider improving the fixture to take a list of connections and add them at once.
  • Handle for telegram provider too: providers/telegram/tests/unit/telegram/hooks/test_telegram.py

Error:

_ TestTelegramHook.test_should_raise_exception_if_chat_id_is_not_provided_anywhere _
providers/telegram/tests/unit/telegram/hooks/test_telegram.py:93: in test_should_raise_exception_if_chat_id_is_not_provided_anywhere
    hook.send_message({"text": "test telegram message"})
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:338: in wrapped_f
    return copy(f, *args, **kw)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:477: in __call__
    do = self.iter(retry_state=retry_state)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:378: in iter
    result = action(retry_state)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:400: in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
/usr/local/lib/python3.9/concurrent/futures/_base.py:439: in result
    return self.__get_result()
/usr/local/lib/python3.9/concurrent/futures/_base.py:391: in __get_result
    raise self._exception
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:480: in __call__
    result = fn(*args, **kwargs)
providers/telegram/src/airflow/providers/telegram/hooks/telegram.py:159: in send_message
    response = asyncio.run(self.connection.send_message(**kwargs))
/usr/local/lib/python3.9/asyncio/runners.py:37: in run
    raise ValueError("a coroutine was expected, got {!r}".format(main))
E   ValueError: a coroutine was expected, got <MagicMock name='get_conn().send_message()' id='140457991957568'>
----------------------------- Captured stdout call -----------------------------
[2025-06-20T11:32:10.235+0000] {base.py:65} INFO - Connection Retrieved 'telegram_default'
[2025-06-20T11:32:10.236+0000] {base.py:65} INFO - Connection Retrieved 'telegram_default'
------------------------------ Captured log call -------------------------------
INFO     airflow.hooks.base:base.py:65 Connection Retrieved 'telegram_default'
INFO     airflow.hooks.base:base.py:65 Connection Retrieved 'telegram_default'
_ TestTelegramHook.test_should_raise_exception_if_chat_id_is_not_provided_anywhere_when_sending_file _
providers/telegram/tests/unit/telegram/hooks/test_telegram.py:208: in test_should_raise_exception_if_chat_id_is_not_provided_anywhere_when_sending_file
    hook.send_file({"file": "/file/to/path"})
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:338: in wrapped_f
    return copy(f, *args, **kw)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:477: in __call__
    do = self.iter(retry_state=retry_state)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:378: in iter
    result = action(retry_state)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:400: in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
/usr/local/lib/python3.9/concurrent/futures/_base.py:439: in result
    return self.__get_result()
/usr/local/lib/python3.9/concurrent/futures/_base.py:391: in __get_result
    raise self._exception
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:480: in __call__
    result = fn(*args, **kwargs)
providers/telegram/src/airflow/providers/telegram/hooks/telegram.py:189: in send_file
    response = asyncio.run(self.connection.send_document(**kwargs))
/usr/local/lib/python3.9/asyncio/runners.py:37: in run
    raise ValueError("a coroutine was expected, got {!r}".format(main))
E   ValueError: a coroutine was expected, got <MagicMock name='get_conn().send_document()' id='140457990581216'>

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@potiuk
Copy link
Member

potiuk commented Jun 19, 2025

nice :)

@potiuk
Copy link
Member

potiuk commented Jun 19, 2025

Hopefully we will be able to move most of the tests out of the "Db" zone this way :)

@amoghrajesh
Copy link
Contributor Author

WARNING: This one is going to get crazy with the number of changes. Planning to use env vars instead of DB.

@potiuk
Copy link
Member

potiuk commented Jun 19, 2025

WARNING: This one is going to get crazy with the number of changes. Planning to use env vars instead of DB.

Yeah. It's going to be big. Maybe consider implementing a parallel solution and make an issue where others will convert provider by provider? That would be way more manageable (and if conversion is simple, would be rather fast).

@amoghrajesh
Copy link
Contributor Author

WARNING: This one is going to get crazy with the number of changes. Planning to use env vars instead of DB.

Yeah. It's going to be big. Maybe consider implementing a parallel solution and make an issue where others will convert provider by provider? That would be way more manageable (and if conversion is simple, would be rather fast).

Let me create an MVP off of this and try to figure out if it needs that kind of splitting.

@amoghrajesh
Copy link
Contributor Author

Ok, getting there now. Probably really close this time. I am running into some errors with the telegram provider which I cannot seem to figure out.

_ TestTelegramHook.test_should_raise_exception_if_chat_id_is_not_provided_anywhere _
providers/telegram/tests/unit/telegram/hooks/test_telegram.py:93: in test_should_raise_exception_if_chat_id_is_not_provided_anywhere
    hook.send_message({"text": "test telegram message"})
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:338: in wrapped_f
    return copy(f, *args, **kw)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:477: in __call__
    do = self.iter(retry_state=retry_state)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:378: in iter
    result = action(retry_state)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:400: in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
/usr/local/lib/python3.9/concurrent/futures/_base.py:439: in result
    return self.__get_result()
/usr/local/lib/python3.9/concurrent/futures/_base.py:391: in __get_result
    raise self._exception
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:480: in __call__
    result = fn(*args, **kwargs)
providers/telegram/src/airflow/providers/telegram/hooks/telegram.py:159: in send_message
    response = asyncio.run(self.connection.send_message(**kwargs))
/usr/local/lib/python3.9/asyncio/runners.py:37: in run
    raise ValueError("a coroutine was expected, got {!r}".format(main))
E   ValueError: a coroutine was expected, got <MagicMock name='get_conn().send_message()' id='140457991957568'>
----------------------------- Captured stdout call -----------------------------
[2025-06-20T11:32:10.235+0000] {base.py:65} INFO - Connection Retrieved 'telegram_default'
[2025-06-20T11:32:10.236+0000] {base.py:65} INFO - Connection Retrieved 'telegram_default'
------------------------------ Captured log call -------------------------------
INFO     airflow.hooks.base:base.py:65 Connection Retrieved 'telegram_default'
INFO     airflow.hooks.base:base.py:65 Connection Retrieved 'telegram_default'
_ TestTelegramHook.test_should_raise_exception_if_chat_id_is_not_provided_anywhere_when_sending_file _
providers/telegram/tests/unit/telegram/hooks/test_telegram.py:208: in test_should_raise_exception_if_chat_id_is_not_provided_anywhere_when_sending_file
    hook.send_file({"file": "/file/to/path"})
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:338: in wrapped_f
    return copy(f, *args, **kw)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:477: in __call__
    do = self.iter(retry_state=retry_state)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:378: in iter
    result = action(retry_state)
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:400: in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
/usr/local/lib/python3.9/concurrent/futures/_base.py:439: in result
    return self.__get_result()
/usr/local/lib/python3.9/concurrent/futures/_base.py:391: in __get_result
    raise self._exception
/usr/local/lib/python3.9/site-packages/tenacity/__init__.py:480: in __call__
    result = fn(*args, **kwargs)
providers/telegram/src/airflow/providers/telegram/hooks/telegram.py:189: in send_file
    response = asyncio.run(self.connection.send_document(**kwargs))
/usr/local/lib/python3.9/asyncio/runners.py:37: in run
    raise ValueError("a coroutine was expected, got {!r}".format(main))
E   ValueError: a coroutine was expected, got <MagicMock name='get_conn().send_document()' id='140457990581216'>

Will add it part of TODO and handle there.

@amoghrajesh
Copy link
Contributor Author

@potiuk I am close with this one, do you wanna take a look?

@amoghrajesh amoghrajesh requested a review from jason810496 June 20, 2025 13:31
Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI is green.

@amoghrajesh
Copy link
Contributor Author

Oh the compat tests! I made changes in #51953, which should fix this but I wonder how i can apply those here.

@amoghrajesh
Copy link
Contributor Author

OK, I decided to modify the fixture to use: as_json because it’s independent of the core changes in #51953 and generally quite stable. The only concern I mentioned is in this comment, but I don’t really have much of a choice at this point to avoid it from happening, it ends up being a bit of a chicken and egg situation here. Core isn't released but provider wants to use it.

Maybe if we see issues in future we can get back to uri repr.

CC @ashb

@amoghrajesh
Copy link
Contributor Author

Finally green!

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIIIIICE!

@potiuk
Copy link
Member

potiuk commented Jun 21, 2025

Shall we merge ;) ?

@amoghrajesh
Copy link
Contributor Author

Merging it, thanks for the reviews @potiuk @ashb @jason810496!

Yep @potiuk on it!

@amoghrajesh amoghrajesh merged commit 8e21e7c into apache:main Jun 22, 2025
98 checks passed
@amoghrajesh amoghrajesh deleted the replace-db-merge-conn branch June 22, 2025 08:32
@amoghrajesh
Copy link
Contributor Author

Will work on the follow up items starting next week

@potiuk
Copy link
Member

potiuk commented Jun 22, 2025

Will work on the follow up items starting next week

Let me see how much od the providers tests I can now move to be non-db tests :D.... That's on me

potiuk added a commit to potiuk/airflow that referenced this pull request Jun 22, 2025
After apache#51930, we can now remove "pytest.mark.db_test" in all providers
that used db only to create connections. This PR is the "trial" attempt
of doing so and it verifies that it is a viable thing to do. It:

* removes all pytest.mark.db_test markers from airbyte
* removes / reshuffles many pytest.mark.db_tests in amazon
* adds pre-commit that we can use to guard that no new
  pytest.mark.db_test markers will be added in providers that we
  already cleaned up (for example airbyte)

This PR, when merged will be followed up with an issue where we will ask
contributors to apply the same approach to all the remaining providers -
where I will describe in detail the process of removing the markers.

This is part of the apache#42632 which has the long-term target of making all
the provider tests non-db tests and simplifying our test setup.
@potiuk
Copy link
Member

potiuk commented Jun 22, 2025

Will work on the follow up items starting next week

Let me see how much od the providers tests I can now move to be non-db tests :D.... That's on me

First phase - amazon + airbyte here #52017

potiuk added a commit to potiuk/airflow that referenced this pull request Jun 22, 2025
After apache#51930, we can now remove "pytest.mark.db_test" in all providers
that used db only to create connections. This PR is the "trial" attempt
of doing so and it verifies that it is a viable thing to do. It:

* removes all pytest.mark.db_test markers from airbyte
* removes / reshuffles many pytest.mark.db_tests in amazon
* adds pre-commit that we can use to guard that no new
  pytest.mark.db_test markers will be added in providers that we
  already cleaned up (for example airbyte)

This PR, when merged will be followed up with an issue where we will ask
contributors to apply the same approach to all the remaining providers -
where I will describe in detail the process of removing the markers.

This is part of the apache#42632 which has the long-term target of making all
the provider tests non-db tests and simplifying our test setup.
potiuk added a commit that referenced this pull request Jun 22, 2025
…le (#52017)

After #51930, we can now remove "pytest.mark.db_test" in all providers
that used db only to create connections. This PR is the "trial" attempt
of doing so and it verifies that it is a viable thing to do. It:

* removes all pytest.mark.db_test markers from airbyte
* removes / reshuffles many pytest.mark.db_tests in amazon
* adds pre-commit that we can use to guard that no new
  pytest.mark.db_test markers will be added in providers that we
  already cleaned up (for example airbyte)

This PR, when merged will be followed up with an issue where we will ask
contributors to apply the same approach to all the remaining providers -
where I will describe in detail the process of removing the markers.

This is part of the #42632 which has the long-term target of making all
the provider tests non-db tests and simplifying our test setup.
@amoghrajesh
Copy link
Contributor Author

@potiuk do you think we have to make changes to the system tests too?
I think those are using the DB and those seem to be ok?

@potiuk
Copy link
Member

potiuk commented Jun 23, 2025

@potiuk do you think we have to make changes to the system tests too? I think those are using the DB and those seem to be ok?

Well. Not really - those are regular DAGs, so they generally should not need (and cannot) access the DB. So we should adapt them too.

@amoghrajesh
Copy link
Contributor Author

On it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants