-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't reuse CPU set of a restartable init container #119447
Don't reuse CPU set of a restartable init container #119447
Conversation
/test pull-kubernetes-node-e2e-containerd-sidecar-containers |
/retest |
1 similar comment
/retest |
@ffromani wdyt about this change? Is it small enough to take after code freeze? There are more things around topology need fixing like #119442 (comment). I wonder if it is better to address them one by one or fix all of them in 1.29? Any suggestions would be appreciated. |
92163b8
to
f584da6
Compare
/triage accepted |
It seems to me that without these fixes the exclusive resource allocation (cpus, memory, devices) are broken for sidecar containers, which is bad but (hopefully?) not deal-breaking bad for a alpha feature. We should however fix them all ASAP and at worst by 1.29 to enable better experience for sidecar containers. I'm neutral with backports: I'd love to have the fixes in 1.28.z but I understand the risks (again talking about a alpha feature). I guess we can discuss with the release folks later. I think among the resource managers the most critical to fix is the device manager because pinning CPUs and memory is somehow considered "advanced topic" while accessing devices is more common AFAIK. Regarding this specific PR: at glance looks nice (I'll have a proper review ASAP). If we want to make it easy to backport, is probably fine; otherwise (and the preferred choice for me) we should have a e2e tests implementing the repro outlined in the PR cover letter. |
f584da6
to
bd13ed2
Compare
bd13ed2
to
a41c498
Compare
/cc @ffromani Thanks for your input and I am glad that you can check the device manager logic.
added an e2e test. |
1b21c31
to
752275a
Compare
/test pull-kubernetes-node-e2e-containerd-sidecar-containers |
1 similar comment
/test pull-kubernetes-node-e2e-containerd-sidecar-containers |
@ffromani can you review again? We're ready to merge this, thanks. |
test/e2e_node/cpu_manager_test.go
Outdated
framework.ExpectNoError(err, "expected log not found in container [%s] of pod [%s]", pod.Spec.Containers[0].Name, pod.Name) | ||
|
||
framework.Logf("got pod logs: %v", logs) | ||
cpus, err = cpuset.Parse(strings.TrimSpace(logs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than overwriting this variable with CPUs allocated to the app container, how about we store it in a separate variable and compare it with the CPUs allocated to initcontainer (here)? Ideally we should be able to show that CPUs that were allocated to initcontainer got reallocated to the app container in this case.
Checking intersection nonReusableCPUs
and cpus allocated to app container shows us that CPUs were allocated exclusively as expected but maybe we can go a step further and demonstrate that CPUs were reused for non-restartable initcontainers as part of this e2e test too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks for the review.
752275a
to
bbd8406
Compare
/test pull-kubernetes-node-e2e-containerd-sidecar-containers |
bbd8406
to
8b5f30e
Compare
/test pull-kubernetes-node-e2e-containerd-sidecar-containers |
@gjkim42: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test pull-kubernetes-node-e2e-containerd-sidecar-containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 30a12830aa88bb494c8e5ab6292551b05e16c7a1
|
/assign @mrunalp Could you take a look at this? This fixes a bug that regular containers reuse the sidecar container's CPU resource. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gjkim42, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
When cpu-manager-policy is static and you create a guaranteed pod with a restartable init container,
a restartable init container shares its CPU set with a regular container.
This PR makes sure that a restartable init container does not share its CPU set with regular containers.
Which issue(s) this PR fixes:
Fixes #
xref: #119442
Special notes for your reviewer:
Before this PR
After this PR
Does this PR introduce a user-facing change?
wrote "none", as we already added a release note in #116429.
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/cc @SergeyKanzhelev @tzneal @klueska
/assign @mrunalp
/priority important-soon