Skip to content

Correctly treat requeues on reschedule sensors as resetting after each reschedule #51410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 1, 2025

Conversation

collinmcnulty
Copy link
Contributor

@collinmcnulty collinmcnulty commented Jun 4, 2025

Reschedule sensors go into and out of running repeatedly within each try_number. Because the requeue logic allows 3 requeues per try_number, a reschedule sensor that experiences the need for requeues many hours apart can still fail. This PR changes that so that only requeues after the last time the task was running (if ever) are included.

closes #49971

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Jun 4, 2025
@collinmcnulty
Copy link
Contributor Author

The changes that introduced this bug were in #43520

@collinmcnulty
Copy link
Contributor Author

Passing breeze tests now

@collinmcnulty collinmcnulty marked this pull request as ready for review June 4, 2025 20:36
@collinmcnulty collinmcnulty changed the title Filter log events to after last running Correctly treat requeues on reschedule sensors as resetting after each reschedule Jun 4, 2025
@dstandish
Copy link
Contributor

This PR changes that so that only requeues after the last time the task was running (if ever) are included.

Included in what?

Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like this one comment fixed first though https://github.com/apache/airflow/pull/51410/files#r2175543440

@jedcunningham
Copy link
Member

This PR changes that so that only requeues after the last time the task was running (if ever) are included.

Included in what?

When we determine how many requeues a task has had. We want it to apply only to this "reschedule" not across all "reschedules" for this try.

@jedcunningham jedcunningham added this to the Airflow 3.0.3 milestone Jul 1, 2025
@jedcunningham jedcunningham added backport-to-v3-0-test Mark PR with this label to backport to v3-0-test branch backport-to-v2-11-test Mark PR with this label to backport to v2-11-test branch labels Jul 1, 2025
@jedcunningham jedcunningham merged commit a362101 into apache:main Jul 1, 2025
58 checks passed
Copy link

github-actions bot commented Jul 1, 2025

Backport failed to create: v3-0-test. View the failure log Run details

Status Branch Result
v3-0-test Commit Link

You can attempt to backport this manually by running:

cherry_picker a362101 v3-0-test

This should apply the commit to the v3-0-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

Copy link

github-actions bot commented Jul 1, 2025

Backport failed to create: v2-11-test. View the failure log Run details

Status Branch Result
v2-11-test Commit Link

You can attempt to backport this manually by running:

cherry_picker a362101 v2-11-test

This should apply the commit to the v2-11-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler backport-to-v2-11-test Mark PR with this label to backport to v2-11-test branch backport-to-v3-0-test Mark PR with this label to backport to v3-0-test branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Requeues doesn't work properly with reschedule sensors
3 participants