Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidecars: terminate sidecars after main containers #120620

Merged
merged 1 commit into from
Oct 31, 2023

Conversation

tzneal
Copy link
Contributor

@tzneal tzneal commented Sep 12, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Sidecars should terminate:

  • after all main containers have exited
  • serialized and in reverse order

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Sidecar termination is now serialized and each sidecar container will receive a SIGTERM after all main containers and later starting sidecar containers have terminated.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/753 
- [KEP Update]: https://github.com/kubernetes/enhancements/pull/4183

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 12, 2023
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 12, 2023
@tzneal
Copy link
Contributor Author

tzneal commented Sep 12, 2023

/test

@k8s-ci-robot
Copy link
Contributor

@tzneal: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-cadvisor-e2e-kubernetes
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-coverage-unit
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-big-performance
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-cos
  • /test pull-kubernetes-e2e-gce-cos-canary
  • /test pull-kubernetes-e2e-gce-cos-no-stage
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-scale-performance-manual
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-update
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

  • /test check-dependency-stats
  • /test pull-ci-kubernetes-unit-windows
  • /test pull-e2e-gce-cloud-provider-disabled
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-autoscaling-hpa-cm
  • /test pull-kubernetes-e2e-autoscaling-hpa-cpu
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
  • /test pull-kubernetes-e2e-capz-windows-alpha-features
  • /test pull-kubernetes-e2e-capz-windows-master
  • /test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-e2e-ec2
  • /test pull-kubernetes-e2e-ec2-conformance
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-cos-alpha-features
  • /test pull-kubernetes-e2e-gce-cos-kubetest2
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-kubelet-credential-provider
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gce-serial
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-gci-gce-ingress
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
  • /test pull-kubernetes-e2e-kind-alpha-features
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-e2e-kind-ipvs-dual-canary
  • /test pull-kubernetes-e2e-kind-kms
  • /test pull-kubernetes-e2e-kind-multizone
  • /test pull-kubernetes-e2e-kops-aws
  • /test pull-kubernetes-e2e-storage-kind-disruptive
  • /test pull-kubernetes-e2e-ubuntu-gce-network-policies
  • /test pull-kubernetes-integration-eks
  • /test pull-kubernetes-kind-dra
  • /test pull-kubernetes-kind-json-logging
  • /test pull-kubernetes-kind-text-logging
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-linter-hints
  • /test pull-kubernetes-local-e2e
  • /test pull-kubernetes-node-arm64-e2e-containerd-ec2
  • /test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-arm64-ubuntu-serial-gce
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-crio-e2e-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-1-7-dra
  • /test pull-kubernetes-node-e2e-containerd-alpha-features
  • /test pull-kubernetes-node-e2e-containerd-ec2
  • /test pull-kubernetes-node-e2e-containerd-features
  • /test pull-kubernetes-node-e2e-containerd-features-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-e2e-containerd-sidecar-containers
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
  • /test pull-kubernetes-node-e2e-crio-dra
  • /test pull-kubernetes-node-kubelet-credential-provider
  • /test pull-kubernetes-node-kubelet-serial-containerd
  • /test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
  • /test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-node-kubelet-serial-pod-disruption-conditions
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
  • /test pull-kubernetes-node-memoryqos-cgrpv2
  • /test pull-kubernetes-node-swap-fedora
  • /test pull-kubernetes-node-swap-fedora-serial
  • /test pull-kubernetes-node-swap-ubuntu-serial
  • /test pull-kubernetes-unit-experimental
  • /test pull-kubernetes-verify-strict-lint
  • /test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-conformance-kind-ipv6-parallel
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-ec2
  • pull-kubernetes-e2e-ec2-conformance
  • pull-kubernetes-e2e-gce
  • pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-integration
  • pull-kubernetes-linter-hints
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-typecheck
  • pull-kubernetes-unit
  • pull-kubernetes-verify
  • pull-kubernetes-verify-strict-lint

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tzneal
Copy link
Contributor Author

tzneal commented Sep 12, 2023

/test pull-kubernetes-node-e2e-containerd-sidecar-containers

@gjkim42
Copy link
Member

gjkim42 commented Sep 12, 2023

/cc

@tzneal
Copy link
Contributor Author

tzneal commented Sep 13, 2023

/test pull-kubernetes-node-e2e-containerd-sidecar-containers

@tzneal tzneal changed the title WIP: sidecars: order termination of sidecars after main containers sidecars: terminate sidecars after main containers Sep 13, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 13, 2023
@tzneal
Copy link
Contributor Author

tzneal commented Sep 13, 2023

/cc @matthyx

@tzneal
Copy link
Contributor Author

tzneal commented Sep 13, 2023

/test pull-kubernetes-node-e2e-containerd-sidecar-containers

Copy link
Member

@gjkim42 gjkim42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one concern,

If a container does not have pre stop hook, we may want to restart until it is not its turn to exit. How can we handle that case? or do we want to not restart all sidecar containers?

// time remaining
if ordering != nil && gracePeriod > 0 {
// grace period is only in seconds, so the time we've waited gets truncated downward
gracePeriod -= int64(ordering.waitForTurn(containerName, gracePeriod))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Subtracting the waiting time from each container's termination grace period.

lgtm, as this way would not break the existing behavior.

@tzneal
Copy link
Contributor Author

tzneal commented Sep 13, 2023

If a container does not have pre stop hook, we may want to restart until it is not its turn to exit. How can we handle that case? or do we want to not restart all sidecar containers?

Definitely open to other opinions, but I think that once we're in the termination phase, we don't restart containers.

@matthyx
Copy link
Contributor

matthyx commented Oct 17, 2023

/assign @mrunalp

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will work for the test cases I suggested, but I realize that it would be great to cover those. There is a small chance the terminatingOrdering may incorrectly wait on non-started containers to terminate.

@@ -49,6 +49,7 @@ import (
utilfeature "k8s.io/apiserver/pkg/util/feature"
runtimeapi "k8s.io/cri-api/pkg/apis/runtime/v1"
kubelettypes "k8s.io/kubelet/pkg/types"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor nit: I don't think this newline is needed here. Those imports seems to be from the same block.

}

// newTerminationOrdering constructs a terminationOrdering based on the pod spec and the currently running containers.
func newTerminationOrdering(pod *v1.Pod, runningContainerNames []string) *terminationOrdering {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this assume that initialization has completed? (i.e. all Init containers are not in running state?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be ok, we only wait on main containers and sidecars. If a non-restartable init container is still running, we don't setup any waits for it and also pre-close the channel for any non-running main container.


})

ginkgo.It("should call sidecar container PreStop hook simultaneously", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add test for termination started while the Pod is still in initialization stage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you please test when some containers hasn't started yet (blocked by the first container's PostStart lifecycle hook)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the first test. I wrote the second one, but it fails due to the pre-existing #116032 . If that is resolved, we can re-add the second test and it should pass.

Sidecars should terminate:
- after all main containers have exited
- serialized and in reverse order
@tzneal tzneal force-pushed the sidecar-termination-ordering branch from eb6669b to 7bcc98c Compare October 18, 2023 00:11
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 18, 2023
@gjkim42
Copy link
Member

gjkim42 commented Oct 18, 2023

/test pull-kubernetes-node-e2e-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers

@dims
Copy link
Member

dims commented Oct 28, 2023

/milestone v1.29

@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 28, 2023
@MaryamTavakkoli
Copy link

Hello!
Bug triage for the 1.29 release cycle is here!
The code freeze will start (01:00 UTC Wednesday 1st November 2023 / 18:00 PDT Tuesday 31st October 2023). as we want to ensure that each PR has a chance to be merged.

Are we still targeting the 1.29 release?

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2023
@gjkim42
Copy link
Member

gjkim42 commented Oct 30, 2023

LGTM thanks.

/assign @SergeyKanzhelev
for final lgtm.

@tzneal
Copy link
Contributor Author

tzneal commented Oct 30, 2023

/hold for @SergeyKanzhelev

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 30, 2023
@matthyx
Copy link
Contributor

matthyx commented Oct 30, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 30, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7fab9c44e07d2d525572f1068b164d53eacb4ecc

"k8s.io/kubernetes/pkg/kubelet/types"
)

// terminationOrdering is used to enforce a termination ordering for sidecar containers. It sets up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Do we have to call this restartable init container instead of the sidecar containers to be consistent with other code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, lets do it in a follow up so we don't lose the approval.

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/unhold
/lgtm

this is good for 1.29. We need a bigger refactoring for enabling restarts

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matthyx, mrunalp, SergeyKanzhelev, tzneal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit ac6f707 into kubernetes:master Oct 31, 2023
17 checks passed
SIG Node PR Triage automation moved this from Needs Approver to Done Oct 31, 2023
@tzneal tzneal deleted the sidecar-termination-ordering branch October 31, 2023 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

9 participants