Skip to content

WIP: Add RestartContainerDuringTermination feature #125710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

gjkim42
Copy link
Member

@gjkim42 gjkim42 commented Jun 25, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #121398

TODOs

This manages container termination in a non-blocking way.
xref:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The new feature gate "RestartContainerDuringTermination" is now available. The feature allows containers to restart during another container's termination or pod termination.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4438-container-restart-termination

/uncc all

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 25, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 25, 2024
@gjkim42 gjkim42 changed the title WIP: Manage container termination in a non-blocking manner WIP: Add RestartContainerDuringTermination feature Jun 25, 2024
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2024
@gjkim42 gjkim42 force-pushed the non-blocking-container-termination branch from dc8af25 to a2fdfd5 Compare July 2, 2024 13:27
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 2, 2024
@gjkim42 gjkim42 force-pushed the non-blocking-container-termination branch from a2fdfd5 to cb0164e Compare July 2, 2024 13:29
@gjkim42
Copy link
Member Author

gjkim42 commented Jul 2, 2024

/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-node-e2e-containerd-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features

@gjkim42 gjkim42 force-pushed the non-blocking-container-termination branch from cb0164e to 12b5db2 Compare July 2, 2024 14:03
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 2, 2024
@gjkim42
Copy link
Member Author

gjkim42 commented Jul 2, 2024

/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-node-e2e-containerd-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features

Comment on lines 2006 to 2155
var podStopped bool
var err error
if !utilfeature.DefaultFeatureGate.Enabled(features.RestartContainerDuringTermination) {
p := kubecontainer.ConvertPodStatusToRunningPod(kl.getRuntime().Type(), podStatus)
err = kl.killPod(ctx, pod, p, gracePeriod)
} else {
podStopped, err = kl.syncTerminatingPod(ctx, pod, podStatus, gracePeriod, kl.getPullSecretsForPod(pod), kl.backOff)
}
if err != nil {
kl.recorder.Eventf(pod, v1.EventTypeWarning, events.FailedToKillPod, "error killing pod: %v", err)
// there was an error killing the pod, so we return that error directly
utilruntime.HandleError(err)
return err
}

if !podStopped {
// If the pod is not yet stopped, we should return an error to signal
// that the sync loop should back off.
return fmt.Errorf("pod is not yet stopped")
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @smarterclayton

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4438-container-restart-termination
This PR is trying to change the (*kubelet).killPod in a non-blocking way so that other parts of the syncLoop can work during the container termination.

What do you think?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 17, 2024
@@ -1158,6 +1166,9 @@ var defaultKubernetesFeatureGates = map[featuregate.Feature]featuregate.FeatureS

ServiceTrafficDistribution: {Default: false, PreRelease: featuregate.Alpha},

// FIXME: Disable by default
RestartContainerDuringTermination: {Default: true, PreRelease: featuregate.Alpha},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this before merge?

@gjkim42 gjkim42 force-pushed the non-blocking-container-termination branch from 12b5db2 to a54bf2e Compare July 23, 2024 11:35
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 5, 2025
@ffromani
Copy link
Contributor

hi @gjkim42 froim sig-node ! the 1.33 code freeze deadline is looming and this PR is still marked as WIP. Are still up for the 1.33 milestone? do you need reviews?

@gjkim42
Copy link
Member Author

gjkim42 commented Mar 18, 2025

Unfortunately, I don't have the bandwidth to work on this PR.

@matthyx
Copy link
Contributor

matthyx commented Mar 18, 2025

Unfortunately, I don't have the bandwidth to work on this PR.

I can check if you want... did you have some pending changes to push?

@gjkim42
Copy link
Member Author

gjkim42 commented Mar 18, 2025

No, I didn't. I still couldn't check the test case you mentioned before.

@ffromani
Copy link
Contributor

Unfortunately, I don't have the bandwidth to work on this PR.

ack, no worries, just checking to plan the rest of the time till the code freeze

@gjkim42 gjkim42 force-pushed the non-blocking-container-termination branch from 9c789d2 to 024bc2f Compare March 18, 2025 13:00
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 18, 2025
@gjkim42 gjkim42 force-pushed the non-blocking-container-termination branch from e29a2f7 to af633df Compare April 10, 2025 09:02
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2025
@gjkim42 gjkim42 force-pushed the non-blocking-container-termination branch from 07c592e to f9d74f3 Compare April 15, 2025 04:45
@gjkim42 gjkim42 force-pushed the non-blocking-container-termination branch 3 times, most recently from 21a8232 to bf98fec Compare April 25, 2025 01:51
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gjkim42
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gjkim42 gjkim42 force-pushed the non-blocking-container-termination branch from bf98fec to e77f9fd Compare April 25, 2025 02:51
@gjkim42 gjkim42 force-pushed the non-blocking-container-termination branch from e77f9fd to f1b7bd7 Compare May 8, 2025 12:33
@k8s-ci-robot
Copy link
Contributor

@gjkim42: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-e2e-containerd-features 12b5db2 link false /test pull-kubernetes-node-e2e-containerd-features
pull-kubernetes-linter-hints f1b7bd7 link false /test pull-kubernetes-linter-hints
pull-kubernetes-unit-windows-master f1b7bd7 link false /test pull-kubernetes-unit-windows-master
pull-kubernetes-e2e-gce-cos-alpha-features f1b7bd7 link false /test pull-kubernetes-e2e-gce-cos-alpha-features

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Archived in project
Status: Waiting on Author
Development

Successfully merging this pull request may close these issues.

A container cannot restart when there is any terminating container in the same pod
9 participants