-
Notifications
You must be signed in to change notification settings - Fork 40.9k
fix(pdb): Ignore terminating pods when checking for controllers to prevent race condition #132431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(pdb): Ignore terminating pods when checking for controllers to prevent race condition #132431
Conversation
…llers to find expected scale - Updates the PodDisruptionBudget controller logic to skip pods with a DeletionTimestamp when verifying controller presence to find expected scale. - Adds a unit test to ensure that pods marked for deletion do not cause errors during expected scale calculation. - Fixes a race condition where a terminating pod's controller may already be deleted, preventing unnecessary errors. Signed-off-by: atilsensalduz <[email protected]>
…llers to find expected scale - Updates the PodDisruptionBudget controller logic to skip pods with a DeletionTimestamp when verifying controller presence to find expected scale. - Adds a unit test to ensure that pods marked for deletion do not cause errors during expected scale calculation. - Fixes a race condition where a terminating pod's controller may already be deleted, preventing unnecessary errors. Signed-off-by: atilsensalduz <[email protected]>
…llers to find expected scale - Updates the PodDisruptionBudget controller logic to skip pods with a DeletionTimestamp when verifying controller presence to find expected scale. - Adds a unit test to ensure that pods marked for deletion do not cause errors during expected scale calculation. - Fixes a race condition where a terminating pod's controller may already be deleted, preventing unnecessary errors. Signed-off-by: atilsensalduz <[email protected]>
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Welcome @atilsensalduz! |
Hi @atilsensalduz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: atilsensalduz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…llers to find expected scale - Updates the PodDisruptionBudget controller logic to skip pods with a DeletionTimestamp when verifying controller presence to find expected scale. - Adds a unit test to ensure that pods marked for deletion do not cause errors during expected scale calculation. - Fixes a race condition where a terminating pod's controller may already be deleted, preventing unnecessary errors. Signed-off-by: atilsensalduz <[email protected]>
…llers to find expected scale - Updates the PodDisruptionBudget controller logic to skip pods with a DeletionTimestamp when verifying controller presence to find expected scale. - Adds a unit test to ensure that pods marked for deletion do not cause errors during expected scale calculation. - Fixes a race condition where a terminating pod's controller may already be deleted, preventing unnecessary errors. Signed-off-by: atilsensalduz <[email protected]>
@@ -896,7 +896,7 @@ func (dc *DisruptionController) getExpectedScale(ctx context.Context, pdb *polic | |||
break | |||
} | |||
} | |||
if !foundController { | |||
if !foundController && pod.DeletionTimestamp == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without knowing much about the context, but does this change require modifying the err info? 🤔
err = fmt.Errorf("found no controllers for pod %q", pod.Name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current error message is still accurate, since it only triggers for pods that are not being deleted, but I’m happy to make it more explicit if you prefer.
Hi @mortent , @smarterclayton just a friendly ping on this when you have a moment, thanks! |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixes a race condition in the PDB controller when determining the expected scale of pods.
Problem:
Previously, if a pod was terminating (had a DeletionTimestamp) and its controller was already deleted (example scenario is during rolling restart if revisionHistoryLimit: 0), the PDB controller would return an error because it could not find a controller for the pod. This created a race condition between finding scale count in already deleted controller and, potentially resulting in unnecessary errors or incorrect PDB status.
Solution:
Impact:
This change prevents the PDB controller from reporting errors or misbehaving in scenarios where pods are terminating and their controllers have already been deleted, thus making the disruption logic more robust.
Which issue(s) this PR is related to:
Closes #130723
Does this PR introduce a user-facing change?
No.