-
Notifications
You must be signed in to change notification settings - Fork 40.9k
[FG:InPlacePodVerticalScaling] Enable removing resource limits from containers #127143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @hshiina. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/kind feature |
/ok-to-test |
f5c6754
to
54bc5fd
Compare
/test pull-kubernetes-node-kubelet-serial-podresize |
/assign |
if cStatus.Resources.CPULimit.MilliValue() > cm.MinMilliCPULimit || | ||
resources.Limits.Cpu().MilliValue() > cm.MinMilliCPULimit { | ||
resources.Limits[v1.ResourceCPU] = cStatus.Resources.CPULimit.DeepCopy() | ||
if cStatus.Resources != nil && cStatus.Resources.CPULimit != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit / optional / preexisting (take it or leave it):
this code looks like you are doing (almost) the same thing 3 times in a row, once for cpu limits, then once for memory limits, then once for cpu requests... makes me feel like it can / should be put into a helper (though admittedly this may not be as straightforward as I am imagining in my head)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though we should do so, fixing and refactoring at the same time could make review difficult. So, I don't introduce a helper in this PR.
pkg/kubelet/kubelet_pods.go
Outdated
} | ||
} else if resources.Requests != nil && !resources.Requests.Cpu().IsZero() { | ||
// Only if requests is in the allocation, restore limits to container sutatus from old status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Only if requests is in the allocation, restore limits to container sutatus from old status. | |
// Only if requests is in the allocation, restore limits to container status from old status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed. Thanks!
value := maxValue | ||
podResources.Memory = &value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
value := maxValue | |
podResources.Memory = &value | |
podResources.Memory = ptr.To(maxValue) |
(import "k8s.io/utils/ptr")
same in other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
return nil | ||
} | ||
if m.cpuCFSQuota && containerResources.Linux.CpuQuota == 0 { | ||
containerResources.Linux.CpuQuota = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this value -1
for the same reason that the maxValue
const you added to pkg/kubelet/kuberuntime/kuberuntime_manager.go
is -1
?
If so, consider putting the maxValue
const somewhere that it can be shared in both places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are slightly different. This -1
is passed to the runtime for updating a container resource while the other is passed to libcontainer/cgroups
for updating pod cgroups. I added a few comments.
if podResources == nil { | ||
klog.ErrorS(nil, "Unable to get resource configuration", "pod", pod.Name) | ||
result.Fail(fmt.Errorf("unable to get resource configuration processing resize for pod %s", pod.Name)) | ||
return | ||
} | ||
|
||
const maxValue = int64(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in your comment #127143 (comment) that this doesn't appear to be documented anywhere, maybe it'll be helpful to leave a comment here with this context so that in the future we can remember where this constant came from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment.
test/e2e/framework/pod/resize.go
Outdated
CPU string `json:"cpu,omitempty"` | ||
Memory string `json:"memory,omitempty"` | ||
EphStor string `json:"ephemeral-storage,omitempty"` | ||
CPU *string `json:"cpu"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to allow null
in a patch for removing requests and limits.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hshiina The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review.
if cStatus.Resources.CPULimit.MilliValue() > cm.MinMilliCPULimit || | ||
resources.Limits.Cpu().MilliValue() > cm.MinMilliCPULimit { | ||
resources.Limits[v1.ResourceCPU] = cStatus.Resources.CPULimit.DeepCopy() | ||
if cStatus.Resources != nil && cStatus.Resources.CPULimit != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though we should do so, fixing and refactoring at the same time could make review difficult. So, I don't introduce a helper in this PR.
return nil | ||
} | ||
if m.cpuCFSQuota && containerResources.Linux.CpuQuota == 0 { | ||
containerResources.Linux.CpuQuota = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are slightly different. This -1
is passed to the runtime for updating a container resource while the other is passed to libcontainer/cgroups
for updating pod cgroups. I added a few comments.
if podResources == nil { | ||
klog.ErrorS(nil, "Unable to get resource configuration", "pod", pod.Name) | ||
result.Fail(fmt.Errorf("unable to get resource configuration processing resize for pod %s", pod.Name)) | ||
return | ||
} | ||
|
||
const maxValue = int64(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment.
value := maxValue | ||
podResources.Memory = &value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
|
||
return false | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is changing a behavior in a graduated beta api, should you not first amend this change to the corresponding KEP instead of trying to get the change directly? or is this PR just asking for feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I See @tallclair already commented here #127143 (comment) ,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: b9a73fb753a3b57d866fa5738a6c2e463833ffd2
|
@hshiina want to try and get this merged in v1.34? |
Yes, I would like to complete this work if possible. |
This fix enables removing resource limits and requests from containers by following changes: - Update pod cgroup resources to max - Update container resources to max with CRI API UpdateContainerResource(). - Display resource limits in container status while limits are being removed.
New changes are detected. LGTM label has been removed. |
@hshiina: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Hello @hshiina @tallclair @natasha41575 |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR fixes handling cases where resources limits are max at InPlacePodVerticalScaling.
When one of containers in a pod has no limits for a resource, no limits will be configured in the cgroup of the pod. This fix allows resizing the resource in a container that has limits when another container in the pod does not have limits for the resource. In doPodResizeAction(), this fix allows a container to be resized even if the pod resources are unlimited (the values in the pod resource is nil).
This PR also enables removing resource limits from containers by following changes:
UpdateContainerResource()
.Which issue(s) this PR fixes:
Fixes #128675
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: