Use base AWS classes in Glue Trigger / Sensor and implement custom waiter #52243

dominikhei · 2025-06-25T12:00:58Z

I have implemented a custom waiter and ported the sensor and trigger to the base AWS classes. For the naming of parameters and what to include in the docstrings, I looked at the already existing Glue ones.

On that note: I noticed that all operators use the max_attempts and poll_interval parameters (although often named differently), that are later passed to the trigger. Wouldn't it make sense to move them to the AWSBaseOperator too?

o-nikolas · 2025-06-26T22:33:01Z

providers/amazon/src/airflow/providers/amazon/aws/operators/glue.py

@@ -254,7 +265,7 @@ def execute_complete(self, context: Context, event: dict[str, Any] | None = None

        if validated_event["status"] != "success":
            raise AirflowException(f"Error in glue job: {validated_event}")
-        return validated_event["value"]
+        return validated_event["run_id"]


Why did we make this change? What are the implications of changing the return value? Is this not a breaking change?

I renamed the key to make it more explicit, see here. That's similar as it is in other services, e.g with the GlueDataQualityRuleRecommendationRunOperator.

I'm happy to revert it to value if preferred, but as long as the change is consistently applied where it's used, the renaming shouldn't introduce any issues?

o-nikolas · 2025-06-26T22:33:34Z

providers/amazon/src/airflow/providers/amazon/aws/operators/glue.py

@@ -231,7 +241,8 @@ def execute(self, context: Context):
                    run_id=self._job_run_id,
                    verbose=self.verbose,
                    aws_conn_id=self.aws_conn_id,
-                    job_poll_interval=self.job_poll_interval,
+                    waiter_delay=int(self.job_poll_interval),
+                    waiter_max_attempts=self.retry_limit,


self.retry_limit is zero by default which means we don't ever attempt again and we'll fail immediately. This is causing our system tests to fail when run in deferrable mode:

o-nikolas · 2025-06-26T22:38:04Z

providers/amazon/src/airflow/providers/amazon/aws/sensors/glue.py

+    :param poke_interval: Polling period in seconds to check for the status of the job. (default: 120)
+    :param max_retries: Number of times before returning the current state. (default: 60)


This means we'll wait for 2 hours. Is that a sane default?

The defaults you added in the Trigger are 60 - 75, any reason to not match that here?

The defaults you added in the Trigger are 60 - 75, any reason to not match that here?

I followed the pattern used in other Glue sensors to stay consistent, as I was a bit uncertain what to use myself, but as you said probably better to stay with what's there. I can adjust that.

In apache#52243 the waiting was moved from custom code within the glue hook to using the aws base waiters when deferring Glue jobs. The Trigger was given inappropriate inputs which caused it to wait for zero attempts, which causes our tests to fail. This change moves to using the common parameters we use for other operators in deferrable with the same defaults as the Trigger has. Note: previously this Operator used to wait indefinitely for the job to either complete or fail. The default now waits for 75 minutes. The aws base waiter has no ability to wait indefinitely, nor do I think it should, that feels like a bug to me. So I'm considering this slight behaviour change a bug fix of a bug fix.

In #52243 the waiting was moved from custom code within the glue hook to using the aws base waiters when deferring Glue jobs. The Trigger was given inappropriate inputs which caused it to wait for zero attempts, which causes our tests to fail. This change moves to using the common parameters we use for other operators in deferrable with the same defaults as the Trigger has. Note: previously this Operator used to wait indefinitely for the job to either complete or fail. The default now waits for 75 minutes. The aws base waiter has no ability to wait indefinitely, nor do I think it should, that feels like a bug to me. So I'm considering this slight behaviour change a bug fix of a bug fix.

In apache#52243 the waiting was moved from custom code within the glue hook to using the aws base waiters when deferring Glue jobs. The Trigger was given inappropriate inputs which caused it to wait for zero attempts, which causes our tests to fail. This change moves to using the common parameters we use for other operators in deferrable with the same defaults as the Trigger has. Note: previously this Operator used to wait indefinitely for the job to either complete or fail. The default now waits for 75 minutes. The aws base waiter has no ability to wait indefinitely, nor do I think it should, that feels like a bug to me. So I'm considering this slight behaviour change a bug fix of a bug fix.

dominikhei added 6 commits June 17, 2025 19:24

Adjusted the GlueJobSensor to inherit from AwsBaseSensor

b6aca49

Changed timeout logic and added further tests

5dee231

Renamed test case due to removal of max_retries param

7d69f8f

Added custom GlueJob waiter

81f2501

Added new params to GlueJobOperator and fixed GlueTrigger tests

444f1c7

Refined params of operator, trigger and hook

97677a1

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jun 25, 2025

dominikhei marked this pull request as ready for review June 25, 2025 12:34

dominikhei requested review from eladkal and o-nikolas as code owners June 25, 2025 12:34

eladkal mentioned this pull request Jun 25, 2025

Use base aws classes in amazon provider Operators/Sensors/Triggers #35278

Open

28 tasks

eladkal requested a review from vincbeck June 25, 2025 13:50

vincbeck approved these changes Jun 26, 2025

View reviewed changes

vincbeck merged commit f641ef3 into apache:main Jun 26, 2025
138 checks passed

o-nikolas reviewed Jun 26, 2025

View reviewed changes

o-nikolas mentioned this pull request Jun 26, 2025

Fix GlueJobOperator deferred waiting #52314

Merged

kyungjunleeme mentioned this pull request Jun 28, 2025

Fix GlueJobOperator deferred waiting (#52314) #52384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use base AWS classes in Glue Trigger / Sensor and implement custom waiter #52243

Use base AWS classes in Glue Trigger / Sensor and implement custom waiter #52243

dominikhei commented Jun 25, 2025

Uh oh!

Uh oh!

o-nikolas Jun 26, 2025

Uh oh!

dominikhei Jun 27, 2025 •

edited

Loading

Uh oh!

o-nikolas Jun 26, 2025

Uh oh!

o-nikolas Jun 26, 2025

Uh oh!

o-nikolas Jun 26, 2025

Uh oh!

dominikhei Jun 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

		:param poke_interval: Polling period in seconds to check for the status of the job. (default: 120)
		:param max_retries: Number of times before returning the current state. (default: 60)

Use base AWS classes in Glue Trigger / Sensor and implement custom waiter #52243

Use base AWS classes in Glue Trigger / Sensor and implement custom waiter #52243

Conversation

dominikhei commented Jun 25, 2025

Uh oh!

Uh oh!

o-nikolas Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

dominikhei Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

o-nikolas Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

o-nikolas Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

o-nikolas Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

dominikhei Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dominikhei Jun 27, 2025 •

edited

Loading

dominikhei Jun 27, 2025 •

edited

Loading