-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readiness Tracker Deadlock for Terminating Resources #660
Comments
Thanks for finding this! We should also call cancel expectations for any observed deletes. Otherwise there is a race condition where an object is deleted sometime after the initial list is gathered but before the operator begins syncing. |
This also involves modifying the cancel expectation function to short-circuit if expectations already satisfied to avoid a memory leak. |
It looks like we are already calling CancelExpect for all deleted constraint templates, so short-circuiting-if-populated would fix a memory leak there |
@theMagicalKarp thank you again! |
ack, lemme know. Happy to help if the scope is too large. |
Introduces a circuit breaker into objectTracker which is tripped once expectations have been met. When tripped, internal state tracking memory can be freed and subsequent operations will not consume additional memory in the tracker. Closes open-policy-agent#660 Signed-off-by: Oren Shomron <[email protected]>
Introduces a circuit breaker into objectTracker which is tripped once expectations have been met. When tripped, internal state tracking memory can be freed and subsequent operations will not consume additional memory in the tracker. Closes open-policy-agent#660 Signed-off-by: Oren Shomron <[email protected]>
Introduces a circuit breaker into objectTracker which is tripped once expectations have been met. When tripped, internal state tracking memory can be freed and subsequent operations will not consume additional memory in the tracker. Closes #660 Signed-off-by: Oren Shomron <[email protected]> Co-authored-by: Max Smythe <[email protected]>
Hi @maxsmythe @shomron - is there a work around for this without having to update to the latest image? We're running verison v3.1.0-beta.9 of the controller. We're seeing what appears to be a similar issue impacting the pod resource, but I don't see anything in a "Terminating" state. Removing pods from the config file and deleting the controller pod does allow it to come up and the readiness probe to pass. I'm just not sure which is the offending pod within the cluster that's causing the readiness probe to return a 500. Any help would be appreciated. |
@niroowns without updating the image, your best bet would be to remove the |
What
This was brought to my attention and discovered by 馃挭@SimKev2 馃挭
Resources which are marked for termination, on gatekeeper startup, cause gatekeeper to fail its readiness probes indefinitely (assuming gatekeeper is trying to sync those resources). This seems to be because gatekeeper is trying to ensure it has successfully loaded the sync cache before it handles any traffic. However, it "expects" but fails to "observe" terminating resources, resulting in a deadlock for the readiness check.
Steps
Delete it with
kubectl delete ns rob-test
(This should hang, since the finalizer won't resolve) The purpose of this is to put a resource permanently into the terminating state.Ensure
Namespace
is in the sync configAfter gatekeeper starts up it should fail its readiness checks indefinitely.
This seems to be because when setting the expectations for the
objectTracker
we take into consideration therob-test
namespace (even though it's terminating). This becomes a problem later when running the sync controller, since it doesn't observe resources marked for termination.I think what makes sense is to run
Observe
on resources which have been marked for termination.So add this
r.tracker.ForData(gvk).Observe(instance)
heregatekeeper/pkg/controller/sync/sync_controller.go
Line 178 in 21b6b4a
FYI @shomron
Environment:
The text was updated successfully, but these errors were encountered: