Skip to content

HttpAsyncHook ignores schema from connection in HttpSensorTrigger (defaults to http instead of https) #52319

Open
@albertwangnz

Description

@albertwangnz

Apache Airflow Provider(s)

http

Versions of Apache Airflow Providers

5.3.0

Apache Airflow version

3.0.2

Operating System

Ubuntu 24.04.2 LTS

Deployment

Virtualenv installation

Deployment details

  • Deployment Type: Virtualenv installation
  • Operating System: Ubuntu 24.04.2 LTS
  • Python Version: 3.12.3
  • Airflow Version: 3.0.2
  • HTTP Provider Version: 5.3.0
  • Database Backend: PostgreSQL 16
  • Secrets Backend: Microsoft Azure Key Vault
  • Authentication: Flask AppBuilder (FAB) with Microsoft Entra ID (SSO)
  • SSL Configuration: Enabled with custom certificates
  • Timezone: Pacific/Auckland
  • Airflow Services Management: systemd unit files for api-server, scheduler, dag-processor, and triggerer
  • Custom Configuration Highlights:
    • Airflow configuration (airflow.cfg) includes:
      • sql_alchemy_conn_secret for DB connection string
      • Azure Key Vault integration for secrets
      • SSL cert/key paths
      • FAB auth manager
    • Environment variables for Azure credentials (AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET)
    • Custom webserver_config.py for SSO
    • Firewall configured to allow port 8443

What happened

I encountered an issue while using a deferrable HttpSensor in Airflow 3.0.2. The sensor is configured to use a connection (https_host-has-no-schema) with the following details:

  • Host: dummyjson.com
  • Port: 443
  • Schema: https

During the initial execution, the sensor correctly uses HttpHook to send a GET request to https://dummyjson.com:443/fake_endpoint, receives a 404 response, and defers the task as expected.

However, when the deferred task resumes via HttpSensorTrigger, the trigger internally creates an HttpAsyncHook object. This hook retrieves the connection using self.get_connection(self.http_conn_id) but appears to lose the schema value. As a result, the final request URL becomes http://dummyjson.com:443/fake_endpoint, which is incorrect and causes unexpected behavior.

This discrepancy between HttpHook and HttpAsyncHook in handling the connection schema seems to stem from how the connection object is retrieved and interpreted asynchronously. The issue may involve BaseHook.get_connection, Connection, or TaskSDKConnection not properly preserving or propagating the schema field.

Here’s a log fragment showing the incorrect behavior. The first request https but the second request is http.

[2025-06-27, 01:46:26] INFO - Connection Retrieved 'https_host-has-no-schema': source="airflow.hooks.base"
[2025-06-27, 01:46:26] DEBUG - Connection Details: 'Connection(conn_id='https_host-has-no-schema', conn_type='http', description=None, host='dummyjson.com', schema='https', login=None, password=None, port=443, extra=None)': source="airflow.hooks.base"
[2025-06-27, 01:46:26] DEBUG - Sending 'GET' to url: https://dummyjson.com:443/fake_endpoint: source="airflow.task.hooks.airflow.providers.http.hooks.http.HttpHook"
......
[2025-06-27, 01:46:31] INFO - Connection Retrieved 'https_host-has-no-schema': source="airflow.hooks.base"
[2025-06-27, 01:46:31] DEBUG - Connection Details: 'Connection(conn_id='https_host-has-no-schema', conn_type='http', description=None, host='dummyjson.com', schema=None, login=None, password=None, port=443, extra=None)': source="airflow.hooks.base"
[2025-06-27, 01:46:31] WARNING - [Try 1 of 3] Request to http://dummyjson.com:443/fake_endpoint failed.: source="airflow.providers.http.hooks.http.HttpAsyncHook"

What you think should happen instead

The expected behavior is that both HttpHook and HttpAsyncHook should consistently respect the schema field defined in the Airflow connection. In this case, the connection https_host-has-no-schema explicitly sets schema=https, so all HTTP requests — synchronous or asynchronous — should use https:// in the final URL.

How to reproduce

  1. Create an Airflow connection named https_host-has-no-schema with the following settings:

    • Host: dummyjson.com
    • Port: 443
    • Schema: https
    • Leave login, password, and extra fields empty.
  2. Create a DAG with the following code:

    from datetime import datetime, timedelta
    from airflow import DAG
    from airflow.providers.http.sensors.http import HttpSensor
    from airflow.providers.http.operators.http import HttpOperator
    
    with DAG(
        dag_id="dag_reproduce_issue",
        description="Reproduce",
        start_date=datetime.now() - timedelta(days=1),
        schedule=None,
        catchup=False,
        default_args={
            "retries": 0,
            "retry_delay": timedelta(minutes=1),
        },
        tags=["reproduce_issue"],
    ) as dag:
    
        https_operator = HttpOperator(
            task_id="https_operator",
            http_conn_id="https_host-has-no-schema",
            endpoint="products",
            method="GET",
        )
    
        https_sensor = HttpSensor(
            task_id="https_sensor",
            http_conn_id="https_host-has-no-schema",
            endpoint="fake_endpoint",
            method="GET",
            deferrable=True,
            poke_interval=30,
            timeout=60,
        )
    
        https_operator >> https_sensor
  3. Run the DAG. Observe the following:

    • https_operator sends a request to https://dummyjson.com:443/products and succeeds.
    • https_sensor initially sends a request to https://dummyjson.com:443/fake_endpoint, receives a 404, and defers.
    • When resumed by HttpSensorTrigger, the request is sent to http://dummyjson.com:443/fake_endpoint instead of https.
  4. Check the logs of the triggerer process to confirm the incorrect URL schema.

Anything else

This issue occurs every time under the following conditions:

  • The Airflow connection's host field does not include a URL schema (i.e., no http:// or https://).
  • The schema field in the connection is explicitly set to https.
  • The HttpSensor is set to deferrable=True. I did not test when deferrable=False.

Under these conditions, the HttpAsyncHook used by HttpSensorTrigger fails to apply the https schema and defaults to http, resulting in incorrect request URLs.

However, if the host is set to https://dummyjson.com (i.e., includes the schema directly in the host field), the issue does not occur. In that case, both the host and port are correctly loaded and used by HttpAsyncHook.

To better observe this behavior, it is recommended to enable DEBUG logging level in Airflow. This will show the full request URL constructed by the hook and confirm whether the schema is being applied correctly.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions