[PodLevelResources] Propagate Pod level hugepage cgroup to containers #131089

KevinTMtz · 2025-03-27T18:24:12Z

What type of PR is this?

What this PR does / why we need it:

Follow up of [PodLevelResources] Pod Level Hugepage Resources.

This PR propagates Pod level hugepage cgroup to containers with the following changes:

Pod level hugepage cgroup when unset in container
Unit test propagate pod level hugepages to containers

Additionally adds:

Validation logic for pod level hugepages
Unit test pod level hugepage default and validation logic
E2E tests for container hugepage resources immutability

Which issue(s) this PR fixes:

Fixes #132543

Special notes for your reviewer:

Does this PR introduce a user-facing change?

- Changes underlying logic to propagate Pod level hugepage cgroup to containers when they do not specify hugepage resources.
- Adds validation to enforce the hugepage aggregated container limits to be smaller or equal to pod-level limits. This was already enforced with the defaulted requests from the specified limits, however it did not make it clear about both hugepage requests and limits.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [Other doc]: https://docs.google.com/document/d/1JaqE2eRmFAPlRayv8vsAWE4SmQCVXQLr9rFPhEaPlvQ/edit?usp=sharing

k8s-ci-robot · 2025-03-27T18:24:22Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-03-27T18:24:22Z

Hi @KevinTMtz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

KevinTMtz · 2025-03-27T18:32:07Z

/assign @ndixita

k8s-ci-robot · 2025-06-25T23:37:02Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: KevinTMtz
Once this PR has been reviewed and has the lgtm label, please assign msau42, tallclair for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ndixita · 2025-06-27T20:49:02Z

/ok-to-test

ndixita · 2025-06-28T23:12:16Z

pkg/kubelet/kuberuntime/kuberuntime_container_linux.go

@@ -321,7 +321,7 @@ func (m *kubeGenericRuntimeManager) calculateLinuxResources(cpuRequest, cpuLimit
 }

 // GetHugepageLimitsFromResources returns limits of each hugepages from resources.
-func GetHugepageLimitsFromResources(resources v1.ResourceRequirements) []*runtimeapi.HugepageLimit {
+func GetHugepageLimitsFromResources(pod *v1.Pod, resources v1.ResourceRequirements) []*runtimeapi.HugepageLimit {


nit: rename resources to containerResources for clarity.

Done, thank you.

ndixita · 2025-06-28T23:17:56Z

pkg/kubelet/kuberuntime/kuberuntime_container_linux.go

-			continue
-		}
-		requiredHugepageLimits[sizeString] = uint64(amountObj.Value())
+	// If the container and the pod specify, the container will have precedence,


this comment seems confusing. It sounds like pod-level limit is being overriden.

What I meant is that the container limits will have precedence over the pod level limits in regards to the cgroup values (overriding the cgroups obtained from the pod level resources in the previous for loop), if container level hugepage limits are specified.

So I removed the overriding terminology and updated the pod level comment to:

// When hugepage limits are specified at pod level and no hugepage limits are // specified at container level, the container's cgroup will reflect the pod level limit.

And the container level comment to:

// If both the pod level and the container level specify hugepages, the container // level will have precedence, so the container's hugepages limit will be reflected // in the container's cgroup values.

ndixita · 2025-06-28T23:26:10Z

pkg/kubelet/kuberuntime/kuberuntime_container_linux.go

-		if err != nil {
-			klog.InfoS("Failed to get hugepage size from resource", "object", resourceObj, "err", err)
-			continue
+	// When hugepage limits are specified at pod level, the container limit will


The comment seems confusing. If container-level values are set, they are used as-is. Use of "precedence" in this case can be confusing.

For logical clarity: Is it possible to populate requiredHugepageLimits using container-level values first, and then propagate pod-level limits to container-level limits only if the the respective key i.e. sizeString doesn't exist in requiredHugepageLimits.

Refer to answer in #131089 (comment)

Updated comment to:

// When hugepage limits are specified at pod level and no hugepage limits are // specified at container level, the container's cgroup will reflect the pod level limit.

ndixita · 2025-06-28T23:28:37Z

pkg/apis/core/validation/validation.go

@@ -4393,6 +4393,22 @@ func validatePodResourceConsistency(spec *core.PodSpec, fldPath *field.Path) fie
 		}
 	}

+	// Pod-level hugepage limits must be >= aggregate limits of all containers in a pod.


It might be worth adding a summarized comment for why it is not allowed to set pod-level huge pages limits to be greater than aggregated container-level huge pages limits and why huge pages overcommitment is not allowed even with pod-level resources feature.

We will also have to update the documentation & KEP with this information.

Updated comment to:

// Pod level hugepage limits must be always equal or greater than the aggregated // container level hugepage limits, this is due to the hugepage resources being // threated as a non overcommitable resource (request and limit must be equal) // for the current container level hugepage behavior. // This is also why hugepages overcommitment is not allowed in pod level resources, // the pod cgroup values must reflect the request/limit set at pod level, and the // container level cgroup values must be within that limit.

ndixita · 2025-06-30T17:01:52Z

/retest

k8s-ci-robot · 2025-06-30T18:06:22Z

@KevinTMtz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-linter-hints	`0921d68`	link	false	`/test pull-kubernetes-linter-hints`
pull-kubernetes-verify	`0921d68`	link	true	`/test pull-kubernetes-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

The hugepage aggregated container limits cannot be greater than pod-level limits. This was already enforced with the defaulted requests from the specfied limits, however it did not make it clear about both hugepage requests and limits.

Pod level hugepage resources are not propagated to the containers, only pod level cgroup values are propagated to the containers when they do not specify hugepage resources.

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 27, 2025

k8s-ci-robot requested review from bart0sh and rphillips March 27, 2025 18:24

k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 27, 2025

github-project-automation bot added this to SIG Node: code and documentation PRs Mar 27, 2025

github-project-automation bot moved this to Triage in SIG Node: code and documentation PRs Mar 27, 2025

k8s-ci-robot assigned ndixita Mar 27, 2025

bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Apr 7, 2025

ndixita moved this to Needs Triage in SIG Node: Pod Level Resources Jun 23, 2025

ndixita added this to SIG Node: Pod Level Resources Jun 23, 2025

ndixita removed the status in SIG Node: Pod Level Resources Jun 23, 2025

KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from b48a3ed to 3ea8028 Compare June 24, 2025 21:31

KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from 3ea8028 to ec198a6 Compare June 25, 2025 17:50

k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jun 25, 2025

github-project-automation bot added this to SIG Apps Jun 25, 2025

github-project-automation bot moved this to Needs Triage in SIG Apps Jun 25, 2025

KevinTMtz force-pushed the pod-level-hugepage-cgroups branch 2 times, most recently from 2b53993 to 141815a Compare June 25, 2025 18:00

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 25, 2025

k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 25, 2025

github-project-automation bot added this to SIG Node CI/Test Board Jun 25, 2025

github-project-automation bot moved this to Triage in SIG Node CI/Test Board Jun 25, 2025

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 25, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 27, 2025

ndixita reviewed Jun 28, 2025

View reviewed changes

KevinTMtz added 5 commits June 30, 2025 18:41

Pod level hugepage cgroup when unset in container

32cfdb1

Unit test propagate pod level hugepages to containers

41ef2e5

Validation logic for pod level hugepages

6daad16

The hugepage aggregated container limits cannot be greater than pod-level limits. This was already enforced with the defaulted requests from the specfied limits, however it did not make it clear about both hugepage requests and limits.

Unit test pod level hugepage default and validation logic

c6d4699

E2E tests for container hugepage resources immutability

063042f

Pod level hugepage resources are not propagated to the containers, only pod level cgroup values are propagated to the containers when they do not specify hugepage resources.

KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from 0921d68 to 063042f Compare June 30, 2025 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PodLevelResources] Propagate Pod level hugepage cgroup to containers #131089

[PodLevelResources] Propagate Pod level hugepage cgroup to containers #131089

KevinTMtz commented Mar 27, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Mar 27, 2025

Uh oh!

k8s-ci-robot commented Mar 27, 2025

Uh oh!

KevinTMtz commented Mar 27, 2025

Uh oh!

k8s-ci-robot commented Jun 25, 2025

Uh oh!

ndixita commented Jun 27, 2025

Uh oh!

ndixita Jun 28, 2025

Uh oh!

KevinTMtz Jun 30, 2025

Uh oh!

ndixita Jun 28, 2025

Uh oh!

KevinTMtz Jun 30, 2025

Uh oh!

ndixita Jun 28, 2025 •

edited

Loading

Uh oh!

KevinTMtz Jun 30, 2025

Uh oh!

ndixita Jun 28, 2025

Uh oh!

KevinTMtz Jun 30, 2025

Uh oh!

ndixita commented Jun 30, 2025

Uh oh!

k8s-ci-robot commented Jun 30, 2025

Uh oh!

Uh oh!

[PodLevelResources] Propagate Pod level hugepage cgroup to containers #131089

Are you sure you want to change the base?

[PodLevelResources] Propagate Pod level hugepage cgroup to containers #131089

Conversation

KevinTMtz commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Mar 27, 2025

Uh oh!

k8s-ci-robot commented Mar 27, 2025

Uh oh!

KevinTMtz commented Mar 27, 2025

Uh oh!

k8s-ci-robot commented Jun 25, 2025

Uh oh!

ndixita commented Jun 27, 2025

Uh oh!

ndixita Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

KevinTMtz Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

ndixita Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

KevinTMtz Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

ndixita Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KevinTMtz Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

ndixita Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

KevinTMtz Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

ndixita commented Jun 30, 2025

Uh oh!

k8s-ci-robot commented Jun 30, 2025

Uh oh!

Uh oh!

KevinTMtz commented Mar 27, 2025 •

edited

Loading

ndixita Jun 28, 2025 •

edited

Loading