-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resubmit "Make StatefulSet restart pods with phase Succeeded" #121389
Resubmit "Make StatefulSet restart pods with phase Succeeded" #121389
Conversation
/test pull-kubernetes-node-e2e-containerd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one nit. I also think given the importance of the fix it would be good to cover it with e2e test. WDYT @alculquicondor @soltysh?
if err != nil { | ||
t.Error(err) | ||
} | ||
pods[0].Status.Phase = v1.PodSucceeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe we could address this comment from previous review now: #120398 (comment)
I don't think e2e tests would give us the granularity we need, but an integration test would be good. Maybe as part of |
I was thinking about e2e, because it is actually a change in kubelet that requires the fix, and it should be able to demonstrate there is currently an issue with 1.27+. I'm fine with either e2e or extending the integration test. |
But can we make the kubelet enter the shutdown process in E2E? |
Yeah, possibly we can if we annotate the test as Disruptive, but this is a huge complication. Going with integration test sounds better. |
/cc @rphillips for gracefulshutdown issues. |
7daf620
to
e07d898
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a tiny nit
@@ -433,6 +434,15 @@ func RecreatesFailedPod(t *testing.T, set *apps.StatefulSet, invariants invarian | |||
if isCreated(pods[0]) { | |||
t.Error("StatefulSet did not recreate failed Pod") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make the message dependent on the phase
EDIT: non-blocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: cf9e4225e158e61331ea0bd6ec1ab91b5f3f5234
|
/retest |
/assign @soltysh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/triage accepted |
/milestone v1.29 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, aleksandra-malinowska, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…y-pick-of-#121389-upstream-release-1.28 Automated cherry pick of #121389: Make StatefulSet restart pods with phase Succeeded
…y-pick-of-#121389-upstream-release-1.27 Automated cherry pick of #121389: Make StatefulSet restart pods with phase Succeeded
What type of PR is this?
/kind bug
/sig apps
What this PR does / why we need it:
After the changes in #115331, pod phase determination changed. It is now possible for StatefulSet pods to get in
Succeeded
pod phase. This can happen when a pod is evicted or a node is stopped and the pod container exists with 0. StatefulSet should restart such pods. It is not possible for a StatefulSet pod to ever truly complete, as validation enforcesrestartPolicy=Always
.This was merged in #120398 and reverted in #120755. The test failures caused by it have been fixed in #120731.
Which issue(s) this PR fixes:
Part of #118310
Does this PR introduce a user-facing change?
/cc @mimowo