Add MwaaTaskSensor to Amazon Provider Package #51719

seanghaeli · 2025-06-14T05:01:42Z

The MwaaTaskSensor waits for the completion of a DAG task instance in an MWAA environment. This PR includes an implementation with unit tests, system tests, and docs. Similar to MwaaDagRunSensor

Also modified system test to have MwaaTriggerDagRunOperator set to deferrable=True. This tests the MwaaTaskSensor and MwaaDagRunSensor sensors during execution of DAG Run rather than only afterwards.

eladkal · 2025-06-15T10:58:03Z

providers/amazon/src/airflow/providers/amazon/aws/triggers/mwaa.py

+        failure_states: Collection[str] | None = None,
+        waiter_delay: int = 60,
+        waiter_max_attempts: int = 720,
+        aws_conn_id: str | None = None,


This isn't consistent with what was discussed on #51196 is it?

eladkal · 2025-06-15T11:05:53Z

providers/amazon/src/airflow/providers/amazon/aws/triggers/mwaa.py

+        self.success_states = set(success_states) if success_states else {TaskInstanceState.SUCCESS.value}
+        self.failure_states = set(failure_states) if failure_states else {TaskInstanceState.FAILED.value}
+
+        if len(self.success_states & self.failure_states):
+            raise ValueError("success_states and failure_states must not have any values in common")
+
+        in_progress_states = {s.value for s in TaskInstanceState} - self.success_states - self.failure_states


is this logic right? If we fall back to the defaults it will consider skipped and removed as in progress state.
Also, there is no protection here against possible future addition of new state to task instance. For example we are discussing #12199

I suggest to add defensive test around adding more states so we'll know to modify code here or maybe we can consider adding more classes to categorized states similar to

airflow/task-sdk/src/airflow/sdk/api/datamodels/_generated.py

Lines 415 to 419 in 083e03a

class TerminalTIState(str, Enum):

SUCCESS = "success"

FAILED = "failed"

SKIPPED = "skipped"

REMOVED = "removed"

There is also upstream_failed for example and others that are terminal that will cause this to wait forever right?

providers/amazon/src/airflow/providers/amazon/aws/sensors/mwaa.py

providers/amazon/tests/system/amazon/aws/example_mwaa.py

providers/amazon/docs/operators/mwaa.rst

providers/amazon/src/airflow/providers/amazon/aws/triggers/mwaa.py

providers/amazon/tests/system/amazon/aws/example_mwaa.py

ramitkataria · 2025-06-18T00:10:28Z

This tests the MwaaTaskSensor and MwaaDagRunSensor sensors during execution of DAG Run rather than only afterwards

Wouldn't it still wait for the dag run to complete? I think in this case the waiting would always be in deferrable mode instead of using the config value for operators.default_deferrable which would probably be the preferred method so that we can test both cases by just changing the config value, without having to modify the code

If we want to test the sensor during execution, we could run the dag again in another task before the sensor task but I'm not sure if we want to be that exhaustive in system tests

seanghaeli · 2025-06-18T19:52:52Z

Wouldn't it still wait for the dag run to complete?

@ramitkataria With MwaaTriggerDagRunOperator's deferrable=True, wouldn't it proceed to the task sensor without waiting for the dag run to be done?

ramitkataria · 2025-06-19T20:48:04Z

Wouldn't it still wait for the dag run to complete?

@ramitkataria With MwaaTriggerDagRunOperator's deferrable=True, wouldn't it proceed to the task sensor without waiting for the dag run to be done?

Also discussed offline but in short, the sensor task would still wait for this task because the sensor task is set to depend on this task since they're in a chain

… adjust the default value of in base class to . - Add defensive test around adding more task instance states to keep of the MwaaTaskCompletedTrigger up to date. - Fix issue where of the MwaaTaskSensor derives to instead of type. - Modify documentation to clearly indicate that the MwaaTaskSensor is meant to sense tasks across different MWAA environments. - Make an optional parameter, where it defaults to the latest dag run. - Externally fetch the task ID variable. - Test the sensor while a DAG Run is still in progress.

seanghaeli · 2025-06-24T01:06:17Z

I see that the commit message is rendering weird above so I'll rewrite it here for clarity:

Comply with PR Rds Operator pass custom conn_id to superclass #51196: explicitly pass aws_conn_id to its superclass, and adjust the default value of aws_conn_id in base class to aws_default.
Add defensive test around adding more task instance states to keep in_progress_states of the MwaaTaskCompletedTrigger up to date.
Fix issue where waiter_delay of the MwaaTaskSensor derives to float instead of int type.
Modify documentation to clearly indicate that the MwaaTaskSensor is meant to sense tasks across different MWAA environments.
Make external_dag_run_id an optional parameter, where it defaults to the latest dag run.
Externally fetch the task ID variable.
Test the sensor while a DAG Run is still in progress.

o-nikolas · 2025-06-24T18:30:00Z

providers/amazon/src/airflow/providers/amazon/aws/sensors/mwaa.py

@@ -132,7 +132,7 @@ def poke(self, context: Context) -> bool:

        if state in self.failure_states:
            raise AirflowException(
-                f"The DAG run {self.external_dag_run_id} of DAG {self.external_dag_id} in MWAA environment {self.external_env_name} "
+                f"The DAG run {self.external_dag_run_id} of DAG {self.external_dag_id} in MWAA environment {self.external_env_name}"


Why did you delete the space here? The env name and the word failed no longer have a space between them now? Or does the env name have a space included at the end of it already?

o-nikolas · 2025-06-24T18:31:10Z

providers/amazon/src/airflow/providers/amazon/aws/sensors/mwaa.py

+        For more information on how to use this sensor, take a look at the guide:
+        :ref:`howto/sensor:MwaaTaskSensor`
+
+    :param external_env_name: The external MWAA environment name that contains the DAG Run you want to wait for


Suggested change

:param external_env_name: The external MWAA environment name that contains the DAG Run you want to wait for

:param external_env_name: The external MWAA environment name that contains the Task Instance you want to wait for

Here and below as well. This Operator just waits for a single task, not the whole Dag Run. I'm assuming this is just copy/paste from the above operator.

o-nikolas · 2025-06-24T18:35:14Z

providers/amazon/src/airflow/providers/amazon/aws/sensors/mwaa.py

+        if state in self.failure_states:
+            raise AirflowException(
+                f"The task {self.external_task_id} of DAG run {self.external_dag_run_id} of DAG {self.external_dag_id} in MWAA environment {self.external_env_name}"
+                f"failed with state: {state}"


Same as comment above, no space?

o-nikolas · 2025-06-24T18:37:01Z

providers/amazon/src/airflow/providers/amazon/aws/triggers/base.py

@@ -80,7 +80,7 @@ def __init__(
        waiter_delay: int,
        waiter_max_attempts: int,
        waiter_config_overrides: dict[str, Any] | None = None,
-        aws_conn_id: str | None,
+        aws_conn_id: str | None = "aws_default",


I think this is probably a good change, but this is the base trigger and will affect all AWS triggers. So I'm curious what caused you to modify this one?

o-nikolas · 2025-06-24T18:39:47Z

providers/amazon/src/airflow/providers/amazon/aws/triggers/mwaa.py

+    """
+    Trigger when an MWAA Task is complete.
+
+    :param external_env_name: The external MWAA environment name that contains the DAG Run you want to wait for


Suggested change

:param external_env_name: The external MWAA environment name that contains the DAG Run you want to wait for

:param external_env_name: The external MWAA environment name that contains the Task Instance you want to wait for

Same as the Operator class, these param descriptions need slight updates for task waiting not dag run waiting.

o-nikolas · 2025-06-24T18:45:45Z

providers/amazon/src/airflow/providers/amazon/aws/triggers/mwaa.py

+        self.success_states = set(success_states) if success_states else {TaskInstanceState.SUCCESS.value}
+        self.failure_states = set(failure_states) if failure_states else {TaskInstanceState.FAILED.value}
+
+        if len(self.success_states & self.failure_states):
+            raise ValueError("success_states and failure_states must not have any values in common")
+
+        in_progress_states = {s.value for s in TaskInstanceState} - self.success_states - self.failure_states


There is also upstream_failed for example and others that are terminal that will cause this to wait forever right?

seanghaeli added 2 commits June 13, 2025 21:17

Add MwaaTaskSensor to Amazon Provider Package

9f9a5c1

include pre-commit hooks

c6cfc12

seanghaeli requested review from eladkal and o-nikolas as code owners June 14, 2025 05:01

boring-cyborg bot added area:providers kind:documentation provider:amazon AWS/Amazon - related issues labels Jun 14, 2025

eladkal requested changes Jun 15, 2025

View reviewed changes

ramitkataria reviewed Jun 17, 2025

View reviewed changes

providers/amazon/src/airflow/providers/amazon/aws/triggers/mwaa.py Outdated Show resolved Hide resolved

ramitkataria reviewed Jun 17, 2025

View reviewed changes

providers/amazon/tests/system/amazon/aws/example_mwaa.py Outdated Show resolved Hide resolved

o-nikolas reviewed Jun 24, 2025

View reviewed changes

	class TerminalTIState(str, Enum):
	SUCCESS = "success"
	FAILED = "failed"
	SKIPPED = "skipped"
	REMOVED = "removed"

	:param external_env_name: The external MWAA environment name that contains the DAG Run you want to wait for
	:param external_env_name: The external MWAA environment name that contains the Task Instance you want to wait for

Add MwaaTaskSensor to Amazon Provider Package #51719

Are you sure you want to change the base?

Add MwaaTaskSensor to Amazon Provider Package #51719

Uh oh!

Conversation

seanghaeli commented Jun 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ramitkataria commented Jun 18, 2025

Uh oh!

seanghaeli commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ramitkataria commented Jun 19, 2025

Uh oh!

seanghaeli commented Jun 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seanghaeli commented Jun 18, 2025 •

edited

Loading