Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't reuse CPU set of a restartable init container #119447

Conversation

gjkim42
Copy link
Member

@gjkim42 gjkim42 commented Jul 19, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

When cpu-manager-policy is static and you create a guaranteed pod with a restartable init container,
a restartable init container shares its CPU set with a regular container.

This PR makes sure that a restartable init container does not share its CPU set with regular containers.

Which issue(s) this PR fixes:

Fixes #
xref: #119442

Special notes for your reviewer:

Before this PR
$ cat <<EOF | kubectl apply -f - 
apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  initContainers:
  - name: init-1
    image: ubuntu
    command:
    - sh
    - -c
    - |
      grep Cpus_allowed_list /proc/self/status | cut -f2
    volumeMounts:
    - name: shared
      mountPath: /tmp/shared
    resources:
      limits:
        cpu: 8
        memory: 100Mi
  - name: restartable-init-2
    image: ubuntu
    command:
    - sh
    - -c
    - |
      grep Cpus_allowed_list /proc/self/status | cut -f2
      sleep 300s
    volumeMounts:
    - name: shared
      mountPath: /tmp/shared
    resources:
      limits:
        cpu: 4
        memory: 200Mi
    restartPolicy: Always
  containers:
  - name: regular-3
    image: ubuntu
    command:
    - sh
    - -c
    - |
      grep Cpus_allowed_list /proc/self/status | cut -f2
      sleep 300s
    resources:
      limits:
        cpu: 4
        memory: 300Mi
    volumeMounts:
    - name: shared
      mountPath: /tmp/shared
  terminationGracePeriodSeconds: 1
  volumes:
  - name: shared
    emptyDir: {}
  restartPolicy: Never
EOF
pod/test created
...
$ kubectl logs test -c init-1
0-3,6-9
$ kubectl logs test -c restartable-init-2
0-1,6-7  # init container and restartalbe init container can share the CPUs, as "init-1" is already terminated and will not restart.
$ kubectl logs test -c regular-3         
0-1,6-7  # restartable init container and regular container should not share the CPUs, as they should run together.
After this PR
$ cat <<EOF | kubectl apply -f - 
apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  initContainers:
  - name: init-1
    image: ubuntu
    command:
    - sh
    - -c
    - |
      grep Cpus_allowed_list /proc/self/status | cut -f2
    volumeMounts:
    - name: shared
      mountPath: /tmp/shared
    resources:
      limits:
        cpu: 8
        memory: 100Mi
  - name: restartable-init-2
    image: ubuntu
    command:
    - sh
    - -c
    - |
      grep Cpus_allowed_list /proc/self/status | cut -f2
      sleep 300s
    volumeMounts:
    - name: shared
      mountPath: /tmp/shared
    resources:
      limits:
        cpu: 4
        memory: 200Mi
    restartPolicy: Always
  containers:
  - name: regular-3
    image: ubuntu
    command:
    - sh
    - -c
    - |
      grep Cpus_allowed_list /proc/self/status | cut -f2
      sleep 300s
    resources:
      limits:
        cpu: 4
        memory: 300Mi
    volumeMounts:
    - name: shared
      mountPath: /tmp/shared
  terminationGracePeriodSeconds: 1
  volumes:
  - name: shared
    emptyDir: {}
  restartPolicy: Never
EOF
pod/test created
...
$ kubectl logs test -c init-1
0-3,6-9
$ kubectl logs test -c restartable-init-2
0-1,6-7
$ kubectl logs test -c regular-3
2-3,8-9

Does this PR introduce a user-facing change?

Fixed a bug where the CPU set allocated to an init container, with containerRestartPolicy of `Always`, were erroneously reused by a regular container.

wrote "none", as we already added a release note in #116429.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers

/cc @SergeyKanzhelev @tzneal @klueska
/assign @mrunalp
/priority important-soon

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Jul 19, 2023
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. labels Jul 19, 2023
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 19, 2023
@gjkim42
Copy link
Member Author

gjkim42 commented Jul 19, 2023

/test pull-kubernetes-node-e2e-containerd-sidecar-containers
/test pull-kubernetes-e2e-gce-cos-alpha-features

@gjkim42
Copy link
Member Author

gjkim42 commented Jul 19, 2023

/retest

1 similar comment
@gjkim42
Copy link
Member Author

gjkim42 commented Jul 19, 2023

/retest

@SergeyKanzhelev
Copy link
Member

@ffromani wdyt about this change? Is it small enough to take after code freeze? There are more things around topology need fixing like #119442 (comment). I wonder if it is better to address them one by one or fix all of them in 1.29? Any suggestions would be appreciated.

@gjkim42 gjkim42 force-pushed the do-not-reuse-cpu-set-of-restartable-init-container branch from 92163b8 to f584da6 Compare July 20, 2023 00:37
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Jul 20, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Jul 20, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 20, 2023
@ffromani
Copy link
Contributor

ffromani commented Jul 21, 2023

@ffromani wdyt about this change? Is it small enough to take after code freeze? There are more things around topology need fixing like #119442 (comment). I wonder if it is better to address them one by one or fix all of them in 1.29? Any suggestions would be appreciated.

It seems to me that without these fixes the exclusive resource allocation (cpus, memory, devices) are broken for sidecar containers, which is bad but (hopefully?) not deal-breaking bad for a alpha feature. We should however fix them all ASAP and at worst by 1.29 to enable better experience for sidecar containers.

I'm neutral with backports: I'd love to have the fixes in 1.28.z but I understand the risks (again talking about a alpha feature). I guess we can discuss with the release folks later.

I think among the resource managers the most critical to fix is the device manager because pinning CPUs and memory is somehow considered "advanced topic" while accessing devices is more common AFAIK.
I'd check, likely fix and then propose a backport for device manager first, then the pair cpu and memory manager with the same priority (different PRs of course).

Regarding this specific PR: at glance looks nice (I'll have a proper review ASAP). If we want to make it easy to backport, is probably fine; otherwise (and the preferred choice for me) we should have a e2e tests implementing the repro outlined in the PR cover letter.

@gjkim42 gjkim42 force-pushed the do-not-reuse-cpu-set-of-restartable-init-container branch from f584da6 to bd13ed2 Compare July 21, 2023 12:15
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 21, 2023
@gjkim42 gjkim42 force-pushed the do-not-reuse-cpu-set-of-restartable-init-container branch from bd13ed2 to a41c498 Compare July 21, 2023 12:47
@gjkim42
Copy link
Member Author

gjkim42 commented Jul 21, 2023

/cc @ffromani

Thanks for your input and I am glad that you can check the device manager logic.

we should have a e2e tests implementing the repro outlined in the PR cover letter.

added an e2e test.

@gjkim42
Copy link
Member Author

gjkim42 commented Sep 25, 2023

/test pull-kubernetes-node-e2e-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers

1 similar comment
@gjkim42
Copy link
Member Author

gjkim42 commented Sep 26, 2023

/test pull-kubernetes-node-e2e-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers

@matthyx
Copy link
Contributor

matthyx commented Sep 28, 2023

@ffromani can you review again? We're ready to merge this, thanks.

framework.ExpectNoError(err, "expected log not found in container [%s] of pod [%s]", pod.Spec.Containers[0].Name, pod.Name)

framework.Logf("got pod logs: %v", logs)
cpus, err = cpuset.Parse(strings.TrimSpace(logs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than overwriting this variable with CPUs allocated to the app container, how about we store it in a separate variable and compare it with the CPUs allocated to initcontainer (here)? Ideally we should be able to show that CPUs that were allocated to initcontainer got reallocated to the app container in this case.

Checking intersection nonReusableCPUs and cpus allocated to app container shows us that CPUs were allocated exclusively as expected but maybe we can go a step further and demonstrate that CPUs were reused for non-restartable initcontainers as part of this e2e test too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks for the review.

@gjkim42 gjkim42 force-pushed the do-not-reuse-cpu-set-of-restartable-init-container branch from 752275a to bbd8406 Compare October 6, 2023 11:37
@gjkim42
Copy link
Member Author

gjkim42 commented Oct 6, 2023

/test pull-kubernetes-node-e2e-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers

@gjkim42 gjkim42 force-pushed the do-not-reuse-cpu-set-of-restartable-init-container branch from bbd8406 to 8b5f30e Compare October 6, 2023 13:16
@gjkim42
Copy link
Member Author

gjkim42 commented Oct 6, 2023

/test pull-kubernetes-node-e2e-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 6, 2023

@gjkim42: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-kubelet-serial-containerd 97bcb48 link false /test pull-kubernetes-node-kubelet-serial-containerd
pull-kubernetes-node-kubelet-serial-containerd-alpha-features 97bcb48 link false /test pull-kubernetes-node-kubelet-serial-containerd-alpha-features

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@gjkim42
Copy link
Member Author

gjkim42 commented Oct 6, 2023

/test pull-kubernetes-node-e2e-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers

Copy link
Contributor

@swatisehgal swatisehgal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 30a12830aa88bb494c8e5ab6292551b05e16c7a1

@swatisehgal swatisehgal moved this from PRs - Needs Reviewer to PRs - Needs Approver in SIG Node CI/Test Board Oct 7, 2023
@swatisehgal swatisehgal moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Oct 7, 2023
@swatisehgal
Copy link
Contributor

/assign @mrunalp @klueska
for approval

@gjkim42
Copy link
Member Author

gjkim42 commented Oct 30, 2023

/assign @mrunalp

Could you take a look at this? This fixes a bug that regular containers reuse the sidecar container's CPU resource.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gjkim42, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2023
@k8s-ci-robot k8s-ci-robot merged commit bfeb3c2 into kubernetes:master Oct 31, 2023
17 checks passed
SIG Node CI/Test Board automation moved this from PRs - Needs Approver to Done Oct 31, 2023
SIG Node PR Triage automation moved this from Needs Approver to Done Oct 31, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 31, 2023
@gjkim42 gjkim42 deleted the do-not-reuse-cpu-set-of-restartable-init-container branch October 31, 2023 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

9 participants