-
Notifications
You must be signed in to change notification settings - Fork 40.9k
DRA: fix deleting orphaned ResourceClaim on startup #132533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I'm a grad student working on a research project about using large language models to automate code review. Based on your commit ef117ed and the changes in pkg/controller/resourceclaim/controller.go, my tool generated this comment:
- Nil Pointer Dereference: Ensure that there are adequate checks in place to handle cases where
newClaim
oroldClaim
could be nil before accessing their properties. - Error Handling: Consider implementing a more robust error handling strategy to prevent cascading failures if
ec.podIndexer.ByIndex
returns an error. - Error Handling Tests: Cover error handling paths, particularly when there are issues retrieving claims from the lister or checking ownership.
- Logical Inversion: The change from
deleted := newObj != nil
todeleted := newObj == nil
is correct if the intention is to treat anil
newObj
as an indication of deletion. Ensure this change aligns with the intended behavior of the function. - Impact on Subsequent Logic: Verify the logging statement for deleted claims to ensure it correctly logs and handles the case when a claim is deleted.
- Testing: It is essential to have tests that cover scenarios for both adding and deleting resource claims to ensure that the new logic behaves as expected.
- Testing the Logging Behavior: Add tests to verify the correct logging output for both deleted and non-deleted claims.
- Separation of Concerns: Consider breaking the
enqueueResourceClaim
function into smaller, more focused functions for checking deletions, updating metrics, and enqueuing claims. - Loop Efficiency: Consider optimizing the
ec.enqueuePod
function in the loop iterating overobjs
. Batching the enqueueing of pods or using a more efficient data structure could reduce overhead. - Resource Claim Checks: In the
podNeedsWork
function, cache the results ofec.claimLister.ResourceClaims(pod.Namespace).Get(*claimName)
if the same claims are checked multiple times within the same execution context.
As part of my research, I'm trying to understand how useful these comments are in real-world development. If you have a moment, I'd be super grateful if you could quickly reply to these two yes/no questions:
- Does this comment provide suggestions from a dimension you hadn’t considered?
-
- Do you find this comment helpful?
Thanks a lot for your time and feedback! And sorry again if this message is a bother.
@dcq01: it is a bother. Please don't do this in the Kubernetes project, or in any other open source project without first ensuring that the owners of the project welcome such comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold
There's still time to add tests, I'll lift the hold when I am ready to merge #132033.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nojnhuh, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
LGTM label has been added. Git tree hash: a7c1bb4213cc1efd998d3a9ff75d9d983fc32736
|
New changes are detected. LGTM label has been removed. |
I tried updating the tests to better exercise this change. The approach I took was to modify the existing tests that check the metrics tracked by I'm also happy to iterate on these tests in a separate PR if we want this fix sooner. |
g.Eventually(ec.queue.Len). | ||
WithTimeout(5*time.Second). | ||
Should(gomega.Equal(len(expectedKeys)), lenDiffMessage) | ||
g.Consistently(ec.queue.Len). | ||
WithTimeout(1*time.Second). | ||
Should(gomega.Equal(len(expectedKeys)), lenDiffMessage) | ||
|
||
for _, expected := range expectedKeys { | ||
actual, shuttingDown := ec.queue.Get() | ||
g.Expect(shuttingDown).To(gomega.BeFalseBecause("workqueue is unexpectedly shutting down")) | ||
g.Expect(actual).To(gomega.Equal(expected)) | ||
ec.queue.Forget(actual) | ||
ec.queue.Done(actual) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic doesn't feel perfectly airtight and it slows the test way down (from ~3s to ~12s, mostly from the Consistently
), so I'm open to ways to make that more robust and/or eliminate the long timeouts here. This might be a place that could make use of the Mock
workqueue if it were importable here.
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR implements the fix for #132334 proposed by @pohly in that issue and tested in #132295. This change fixes the logic in the ResourceClaim controller to clean up deleted ResourceClaims in upgrade and similar scenarios involving kube-controller-manager restarts.
Which issue(s) this PR is related to:
#132334
Special notes for your reviewer:
I plan to iterate on a test to reproduce the issue soon, but opened this PR now to unblock #132295.
Does this PR introduce a user-facing change?