-
Notifications
You must be signed in to change notification settings - Fork 15.2k
handle overflow in TaskInstance next_retry_datetime fixes 47971 #48557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
handle overflow in TaskInstance next_retry_datetime fixes 47971 #48557
Conversation
2eac3d5
to
3c95374
Compare
If you turn off whitespace changes, only 6 lines are added in the new version and none are removed. :) |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Is it possible to localise the try-except block a bit more? Currently the block is a bit too large IMO. |
3c95374
to
37deb82
Compare
The overflow error is likely two places: on line 1120: And on line 1145: https://docs.python.org/3/library/datetime.html#timedelta-objects Keeping the try block as it currently is avoids having two try blocks, but possibly two try blocks would be more readable. I'll fiddle with it and try to improve. Thanks for the feedback. |
Looking a little closer, we cap MAX_RETRY_DELAY at 24 * 60 * 60 so probably it can't throw at line 1145.... I'll need to be sure that isn't configurable at run time, but as long as it is always less than 999999999 days then I can shorten up the section in the try block. I'll try to get to it tonight. |
37deb82
to
0eff257
Compare
Done. Thanks for the feedback, it is much cleaner now. |
dd39dbd
to
67b6c67
Compare
67b6c67
to
185ed8c
Compare
except OverflowError: | ||
min_backoff = MAX_RETRY_DELAY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s add a log message here so people know better what’s going on if they get unexpected behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't warn when limiting the retry time for other reasons. I think it should be consistent one way or the other. I'm happy to add a message here and add another ticket to log when limiting the time until the next retry at other locations. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me to another instance we do this kind of limiting? (We seem to only catch OverflowError in one other place and it’s for a different purpose)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lower in the function the code ensures the delay isn’t longer than a day, but it doesn’t log anything when it does so. I'm talking about this:
min(modded_hash, MAX_RETRY_DELAY)
around line 1149
We don't inform the user we are putting a cap on the exponential time, so it seems inconsistent to log about capping the value due to overflow but not due to exceeding the MAX_RETRY_DELAY. Overflow is just a implementation detail due number representation limits.... I think capping the value due to overflow and due to exceeded the MAX_RETRY_DELAY could both be unexpected.
This handles overflow when calculating the next execution time for a task instance by falling back to the configured maximum delay. The solution uses the same strategy that tenacity uses:
https://github.com/jd/tenacity/blob/main/tenacity/wait.py#L167
An alternate solution would be the determine the maximum tries that wouldn't exceed the maximum delay and then not calculate the timeout for values larger than that.
Something like
This fixes 47971