Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resubmit "Make StatefulSet restart pods with phase Succeeded" #121389

Merged

Conversation

aleksandra-malinowska
Copy link
Contributor

@aleksandra-malinowska aleksandra-malinowska commented Oct 20, 2023

What type of PR is this?

/kind bug
/sig apps

What this PR does / why we need it:

After the changes in #115331, pod phase determination changed. It is now possible for StatefulSet pods to get in Succeeded pod phase. This can happen when a pod is evicted or a node is stopped and the pod container exists with 0. StatefulSet should restart such pods. It is not possible for a StatefulSet pod to ever truly complete, as validation enforces restartPolicy=Always.

This was merged in #120398 and reverted in #120755. The test failures caused by it have been fixed in #120731.

Which issue(s) this PR fixes:

Part of #118310

Does this PR introduce a user-facing change?

Fixes a 1.27 regression where StatefulSet might not restart a pod after eviction or node failure.

/cc @mimowo

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 20, 2023
@aleksandra-malinowska
Copy link
Contributor Author

/cc @mimowo @soltysh

@mimowo
Copy link
Contributor

mimowo commented Oct 20, 2023

/test pull-kubernetes-node-e2e-containerd
Looks unrelated

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit. I also think given the importance of the fix it would be good to cover it with e2e test. WDYT @alculquicondor @soltysh?

if err != nil {
t.Error(err)
}
pods[0].Status.Phase = v1.PodSucceeded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe we could address this comment from previous review now: #120398 (comment)

@alculquicondor
Copy link
Member

I don't think e2e tests would give us the granularity we need, but an integration test would be good. Maybe as part of TestDeletingAndFailedPods in https://github.com/kubernetes/kubernetes/blame/master/test/integration/statefulset/statefulset_test.go ?

@mimowo
Copy link
Contributor

mimowo commented Oct 23, 2023

I don't think e2e tests would give us the granularity we need, but an integration test would be good. Maybe as part of TestDeletingAndFailedPods in https://github.com/kubernetes/kubernetes/blame/master/test/integration/statefulset/statefulset_test.go ?

I was thinking about e2e, because it is actually a change in kubelet that requires the fix, and it should be able to demonstrate there is currently an issue with 1.27+. I'm fine with either e2e or extending the integration test.

@alculquicondor
Copy link
Member

But can we make the kubelet enter the shutdown process in E2E?

@mimowo
Copy link
Contributor

mimowo commented Oct 23, 2023

But can we make the kubelet enter the shutdown process in E2E?

Yeah, possibly we can if we annotate the test as Disruptive, but this is a huge complication. Going with integration test sounds better.

@kannon92
Copy link
Contributor

/cc @rphillips for gracefulshutdown issues.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 26, 2023
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a tiny nit

@@ -433,6 +434,15 @@ func RecreatesFailedPod(t *testing.T, set *apps.StatefulSet, invariants invarian
if isCreated(pods[0]) {
t.Error("StatefulSet did not recreate failed Pod")
Copy link
Contributor

@mimowo mimowo Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make the message dependent on the phase

EDIT: non-blocking

Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: cf9e4225e158e61331ea0bd6ec1ab91b5f3f5234

@aleksandra-malinowska
Copy link
Contributor Author

/retest

@mimowo
Copy link
Contributor

mimowo commented Oct 27, 2023

/assign @soltysh

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@soltysh
Copy link
Contributor

soltysh commented Oct 30, 2023

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 30, 2023
@soltysh
Copy link
Contributor

soltysh commented Oct 30, 2023

/milestone v1.29

@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 30, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, aleksandra-malinowska, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2023
@k8s-ci-robot k8s-ci-robot merged commit 05765a8 into kubernetes:master Oct 30, 2023
15 checks passed
k8s-ci-robot added a commit that referenced this pull request Nov 16, 2023
…y-pick-of-#121389-upstream-release-1.28

Automated cherry pick of #121389: Make StatefulSet restart pods with phase Succeeded
k8s-ci-robot added a commit that referenced this pull request Nov 16, 2023
…y-pick-of-#121389-upstream-release-1.27

Automated cherry pick of #121389: Make StatefulSet restart pods with phase Succeeded
@liggitt liggitt added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

7 participants