Skip to content

[PodLevelResources] Propagate Pod level hugepage cgroup to containers #131089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

KevinTMtz
Copy link
Contributor

@KevinTMtz KevinTMtz commented Mar 27, 2025

What type of PR is this?

What this PR does / why we need it:

Follow up of [PodLevelResources] Pod Level Hugepage Resources.

This PR propagates Pod level hugepage cgroup to containers with the following changes:

  1. Pod level hugepage cgroup when unset in container
  2. Unit test propagate pod level hugepages to containers

Additionally adds:

  1. Validation logic for pod level hugepages
  2. Unit test pod level hugepage default and validation logic
  3. E2E tests for container hugepage resources immutability

Which issue(s) this PR fixes:

Fixes #132543

Special notes for your reviewer:

Does this PR introduce a user-facing change?

- Changes underlying logic to propagate Pod level hugepage cgroup to containers when they do not specify hugepage resources.
- Adds validation to enforce the hugepage aggregated container limits to be smaller or equal to pod-level limits. This was already enforced with the defaulted requests from the specified limits, however it did not make it clear about both hugepage requests and limits.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [Other doc]: https://docs.google.com/document/d/1JaqE2eRmFAPlRayv8vsAWE4SmQCVXQLr9rFPhEaPlvQ/edit?usp=sharing

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 27, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Hi @KevinTMtz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 27, 2025
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 27, 2025
@KevinTMtz
Copy link
Contributor Author

/assign @ndixita

@bart0sh bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Apr 7, 2025
@ndixita ndixita moved this to Needs Triage in SIG Node: Pod Level Resources Jun 23, 2025
@KevinTMtz KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from b48a3ed to 3ea8028 Compare June 24, 2025 21:31
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Jun 24, 2025
@KevinTMtz KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from 3ea8028 to ec198a6 Compare June 25, 2025 17:50
@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jun 25, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Jun 25, 2025
@KevinTMtz KevinTMtz force-pushed the pod-level-hugepage-cgroups branch 2 times, most recently from 2b53993 to 141815a Compare June 25, 2025 18:00
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: KevinTMtz
Once this PR has been reviewed and has the lgtm label, please assign msau42, tallclair for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 25, 2025
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 25, 2025
@ndixita
Copy link
Contributor

ndixita commented Jun 27, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 27, 2025
@@ -321,7 +321,7 @@ func (m *kubeGenericRuntimeManager) calculateLinuxResources(cpuRequest, cpuLimit
}

// GetHugepageLimitsFromResources returns limits of each hugepages from resources.
func GetHugepageLimitsFromResources(resources v1.ResourceRequirements) []*runtimeapi.HugepageLimit {
func GetHugepageLimitsFromResources(pod *v1.Pod, resources v1.ResourceRequirements) []*runtimeapi.HugepageLimit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename resources to containerResources for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you.

continue
}
requiredHugepageLimits[sizeString] = uint64(amountObj.Value())
// If the container and the pod specify, the container will have precedence,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment seems confusing. It sounds like pod-level limit is being overriden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that the container limits will have precedence over the pod level limits in regards to the cgroup values (overriding the cgroups obtained from the pod level resources in the previous for loop), if container level hugepage limits are specified.

So I removed the overriding terminology and updated the pod level comment to:

	// When hugepage limits are specified at pod level and no hugepage limits are
	// specified at container level, the container's cgroup will reflect the pod level limit.

And the container level comment to:

	// If both the pod level and the container level specify hugepages, the container
	// level will have precedence, so the container's hugepages limit will be reflected
	// in the container's cgroup values.

if err != nil {
klog.InfoS("Failed to get hugepage size from resource", "object", resourceObj, "err", err)
continue
// When hugepage limits are specified at pod level, the container limit will
Copy link
Contributor

@ndixita ndixita Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems confusing. If container-level values are set, they are used as-is. Use of "precedence" in this case can be confusing.

For logical clarity: Is it possible to populate requiredHugepageLimits using container-level values first, and then propagate pod-level limits to container-level limits only if the the respective key i.e. sizeString doesn't exist in requiredHugepageLimits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to answer in #131089 (comment)

Updated comment to:

	// When hugepage limits are specified at pod level and no hugepage limits are
	// specified at container level, the container's cgroup will reflect the pod level limit.

@@ -4393,6 +4393,22 @@ func validatePodResourceConsistency(spec *core.PodSpec, fldPath *field.Path) fie
}
}

// Pod-level hugepage limits must be >= aggregate limits of all containers in a pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a summarized comment for why it is not allowed to set pod-level huge pages limits to be greater than aggregated container-level huge pages limits and why huge pages overcommitment is not allowed even with pod-level resources feature.

We will also have to update the documentation & KEP with this information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment to:

	// Pod level hugepage limits must be always equal or greater than the aggregated
	// container level hugepage limits, this is due to the hugepage resources being
	// threated as a non overcommitable resource (request and limit must be equal)
	// for the current container level hugepage behavior.
	// This is also why hugepages overcommitment is not allowed in pod level resources,
	// the pod cgroup values must reflect the request/limit set at pod level, and the
	// container level cgroup values must be within that limit.

@ndixita
Copy link
Contributor

ndixita commented Jun 30, 2025

/retest

@k8s-ci-robot
Copy link
Contributor

@KevinTMtz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-linter-hints 0921d68 link false /test pull-kubernetes-linter-hints
pull-kubernetes-verify 0921d68 link true /test pull-kubernetes-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

The hugepage aggregated container limits cannot be greater than pod-level limits.

This was already enforced with the defaulted requests from the specfied
limits, however it did not make it clear about both hugepage requests and limits.
Pod level hugepage resources are not propagated to the containers, only pod level cgroup values are propagated to the containers when they do not specify hugepage resources.
@KevinTMtz KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from 0921d68 to 063042f Compare June 30, 2025 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

Support for Pod-Level Hugepage Resources
3 participants