Description
Apache Airflow Provider(s)
http
Versions of Apache Airflow Providers
5.3.0
Apache Airflow version
3.0.2
Operating System
Ubuntu 24.04.2 LTS
Deployment
Virtualenv installation
Deployment details
- Deployment Type: Virtualenv installation
- Operating System: Ubuntu 24.04.2 LTS
- Python Version: 3.12.3
- Airflow Version: 3.0.2
- HTTP Provider Version: 5.3.0
- Database Backend: PostgreSQL 16
- Secrets Backend: Microsoft Azure Key Vault
- Authentication: Flask AppBuilder (FAB) with Microsoft Entra ID (SSO)
- SSL Configuration: Enabled with custom certificates
- Timezone: Pacific/Auckland
- Airflow Services Management: systemd unit files for
api-server
,scheduler
,dag-processor
, andtriggerer
- Custom Configuration Highlights:
- Airflow configuration (
airflow.cfg
) includes:sql_alchemy_conn_secret
for DB connection string- Azure Key Vault integration for secrets
- SSL cert/key paths
- FAB auth manager
- Environment variables for Azure credentials (
AZURE_CLIENT_ID
,AZURE_TENANT_ID
,AZURE_CLIENT_SECRET
) - Custom
webserver_config.py
for SSO - Firewall configured to allow port 8443
- Airflow configuration (
What happened
I encountered an issue while using a deferrable HttpSensor
in Airflow 3.0.2. The sensor is configured to use a connection (https_host-has-no-schema
) with the following details:
- Host:
dummyjson.com
- Port:
443
- Schema:
https
During the initial execution, the sensor correctly uses HttpHook
to send a GET request to https://dummyjson.com:443/fake_endpoint
, receives a 404 response, and defers the task as expected.
However, when the deferred task resumes via HttpSensorTrigger
, the trigger internally creates an HttpAsyncHook
object. This hook retrieves the connection using self.get_connection(self.http_conn_id)
but appears to lose the schema
value. As a result, the final request URL becomes http://dummyjson.com:443/fake_endpoint
, which is incorrect and causes unexpected behavior.
This discrepancy between HttpHook
and HttpAsyncHook
in handling the connection schema seems to stem from how the connection object is retrieved and interpreted asynchronously. The issue may involve BaseHook.get_connection
, Connection
, or TaskSDKConnection
not properly preserving or propagating the schema
field.
Here’s a log fragment showing the incorrect behavior. The first request https
but the second request is http
.
[2025-06-27, 01:46:26] INFO - Connection Retrieved 'https_host-has-no-schema': source="airflow.hooks.base"
[2025-06-27, 01:46:26] DEBUG - Connection Details: 'Connection(conn_id='https_host-has-no-schema', conn_type='http', description=None, host='dummyjson.com', schema='https', login=None, password=None, port=443, extra=None)': source="airflow.hooks.base"
[2025-06-27, 01:46:26] DEBUG - Sending 'GET' to url: https://dummyjson.com:443/fake_endpoint: source="airflow.task.hooks.airflow.providers.http.hooks.http.HttpHook"
......
[2025-06-27, 01:46:31] INFO - Connection Retrieved 'https_host-has-no-schema': source="airflow.hooks.base"
[2025-06-27, 01:46:31] DEBUG - Connection Details: 'Connection(conn_id='https_host-has-no-schema', conn_type='http', description=None, host='dummyjson.com', schema=None, login=None, password=None, port=443, extra=None)': source="airflow.hooks.base"
[2025-06-27, 01:46:31] WARNING - [Try 1 of 3] Request to http://dummyjson.com:443/fake_endpoint failed.: source="airflow.providers.http.hooks.http.HttpAsyncHook"
What you think should happen instead
The expected behavior is that both HttpHook and HttpAsyncHook should consistently respect the schema field defined in the Airflow connection. In this case, the connection https_host-has-no-schema explicitly sets schema=https, so all HTTP requests — synchronous or asynchronous — should use https:// in the final URL.
How to reproduce
-
Create an Airflow connection named
https_host-has-no-schema
with the following settings:- Host:
dummyjson.com
- Port:
443
- Schema:
https
- Leave login, password, and extra fields empty.
- Host:
-
Create a DAG with the following code:
from datetime import datetime, timedelta from airflow import DAG from airflow.providers.http.sensors.http import HttpSensor from airflow.providers.http.operators.http import HttpOperator with DAG( dag_id="dag_reproduce_issue", description="Reproduce", start_date=datetime.now() - timedelta(days=1), schedule=None, catchup=False, default_args={ "retries": 0, "retry_delay": timedelta(minutes=1), }, tags=["reproduce_issue"], ) as dag: https_operator = HttpOperator( task_id="https_operator", http_conn_id="https_host-has-no-schema", endpoint="products", method="GET", ) https_sensor = HttpSensor( task_id="https_sensor", http_conn_id="https_host-has-no-schema", endpoint="fake_endpoint", method="GET", deferrable=True, poke_interval=30, timeout=60, ) https_operator >> https_sensor
-
Run the DAG. Observe the following:
https_operator
sends a request tohttps://dummyjson.com:443/products
and succeeds.https_sensor
initially sends a request tohttps://dummyjson.com:443/fake_endpoint
, receives a 404, and defers.- When resumed by
HttpSensorTrigger
, the request is sent tohttp://dummyjson.com:443/fake_endpoint
instead ofhttps
.
-
Check the logs of the triggerer process to confirm the incorrect URL schema.
Anything else
This issue occurs every time under the following conditions:
- The Airflow connection's host field does not include a URL schema (i.e., no
http://
orhttps://
). - The schema field in the connection is explicitly set to
https
. - The
HttpSensor
is set todeferrable=True
. I did not test whendeferrable=False
.
Under these conditions, the HttpAsyncHook
used by HttpSensorTrigger
fails to apply the https
schema and defaults to http
, resulting in incorrect request URLs.
However, if the host is set to https://dummyjson.com
(i.e., includes the schema directly in the host field), the issue does not occur. In that case, both the host and port are correctly loaded and used by HttpAsyncHook
.
To better observe this behavior, it is recommended to enable DEBUG logging level in Airflow. This will show the full request URL constructed by the hook and confirm whether the schema is being applied correctly.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct