-
Notifications
You must be signed in to change notification settings - Fork 40.9k
[PodLevelResources] Propagate Pod level hugepage cgroup to containers #131089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @KevinTMtz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/assign @ndixita |
b48a3ed
to
3ea8028
Compare
3ea8028
to
ec198a6
Compare
2b53993
to
141815a
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: KevinTMtz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
@@ -321,7 +321,7 @@ func (m *kubeGenericRuntimeManager) calculateLinuxResources(cpuRequest, cpuLimit | |||
} | |||
|
|||
// GetHugepageLimitsFromResources returns limits of each hugepages from resources. | |||
func GetHugepageLimitsFromResources(resources v1.ResourceRequirements) []*runtimeapi.HugepageLimit { | |||
func GetHugepageLimitsFromResources(pod *v1.Pod, resources v1.ResourceRequirements) []*runtimeapi.HugepageLimit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename resources to containerResources for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thank you.
continue | ||
} | ||
requiredHugepageLimits[sizeString] = uint64(amountObj.Value()) | ||
// If the container and the pod specify, the container will have precedence, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment seems confusing. It sounds like pod-level limit is being overriden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that the container limits will have precedence over the pod level limits in regards to the cgroup values (overriding the cgroups obtained from the pod level resources in the previous for loop), if container level hugepage limits are specified.
So I removed the overriding terminology and updated the pod level comment to:
// When hugepage limits are specified at pod level and no hugepage limits are
// specified at container level, the container's cgroup will reflect the pod level limit.
And the container level comment to:
// If both the pod level and the container level specify hugepages, the container
// level will have precedence, so the container's hugepages limit will be reflected
// in the container's cgroup values.
if err != nil { | ||
klog.InfoS("Failed to get hugepage size from resource", "object", resourceObj, "err", err) | ||
continue | ||
// When hugepage limits are specified at pod level, the container limit will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment seems confusing. If container-level values are set, they are used as-is. Use of "precedence" in this case can be confusing.
For logical clarity: Is it possible to populate requiredHugepageLimits using container-level values first, and then propagate pod-level limits to container-level limits only if the the respective key i.e. sizeString doesn't exist in requiredHugepageLimits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to answer in #131089 (comment)
Updated comment to:
// When hugepage limits are specified at pod level and no hugepage limits are
// specified at container level, the container's cgroup will reflect the pod level limit.
@@ -4393,6 +4393,22 @@ func validatePodResourceConsistency(spec *core.PodSpec, fldPath *field.Path) fie | |||
} | |||
} | |||
|
|||
// Pod-level hugepage limits must be >= aggregate limits of all containers in a pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding a summarized comment for why it is not allowed to set pod-level huge pages limits to be greater than aggregated container-level huge pages limits and why huge pages overcommitment is not allowed even with pod-level resources feature.
We will also have to update the documentation & KEP with this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comment to:
// Pod level hugepage limits must be always equal or greater than the aggregated
// container level hugepage limits, this is due to the hugepage resources being
// threated as a non overcommitable resource (request and limit must be equal)
// for the current container level hugepage behavior.
// This is also why hugepages overcommitment is not allowed in pod level resources,
// the pod cgroup values must reflect the request/limit set at pod level, and the
// container level cgroup values must be within that limit.
/retest |
@KevinTMtz: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The hugepage aggregated container limits cannot be greater than pod-level limits. This was already enforced with the defaulted requests from the specfied limits, however it did not make it clear about both hugepage requests and limits.
Pod level hugepage resources are not propagated to the containers, only pod level cgroup values are propagated to the containers when they do not specify hugepage resources.
0921d68
to
063042f
Compare
What type of PR is this?
What this PR does / why we need it:
Follow up of [PodLevelResources] Pod Level Hugepage Resources.
This PR propagates Pod level hugepage cgroup to containers with the following changes:
Additionally adds:
Which issue(s) this PR fixes:
Fixes #132543
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: