Skip to content

[PodLevelResources] Pod Level Resources Eviction Manager #132277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

KevinTMtz
Copy link
Contributor

@KevinTMtz KevinTMtz commented Jun 12, 2025

What type of PR is this?

What this PR does / why we need it:

This PR implements Pod Level Resources Eviction Manager that require following changes:

  1. Use pod level resources for eviction manager
  2. Unit tests for pod level resources eviction manager
  3. E2E tests for pod level resources eviction manager
  4. E2E tests for pod level resources Kubelet Preemption

Which issue(s) this PR is related to:

Fixes #132448

Special notes for your reviewer:

Does this PR introduce a user-facing change?

- No, changes underlying logic for Eviction Manager helper functions

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 12, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Hi @KevinTMtz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 12, 2025
@k8s-ci-robot k8s-ci-robot requested review from feiskyer and matthyx June 12, 2025 23:38
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 12, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 12, 2025
@KevinTMtz KevinTMtz force-pushed the pod-level-resources-eviction-manager branch from f0a8d0f to 4a2fd75 Compare June 12, 2025 23:51
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 12, 2025
@KevinTMtz KevinTMtz changed the title Pod level resources eviction manager [PodLevelResources] Pod Level Resources Eviction Manager Jun 13, 2025
@KevinTMtz
Copy link
Contributor Author

/assign @ndixita

@ndixita
Copy link
Contributor

ndixita commented Jun 13, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jun 13, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 13, 2025

switch resourceToReclaim {
case v1.ResourceMemory:
podUsage = memoryUsage(podStats.Memory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be cases where stats cannot be collected via the Kubelet Summary API, so it might be better to perform a nil check for podStats.Memory here, just like with container-level resource requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you

podUsage = resource.NewQuantity(int64(*podStats.EphemeralStorage.UsedBytes), resource.BinarySI)
}

message += fmt.Sprintf("Pod %s was using %s, request is %s, has larger consumption of %v. ", pod.Name, podUsage.String(), podRequest.String(), resourceToReclaim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It might be better to extract the format of this message into a variable (e.g. podMessageFmt), similar to containerMessageFmt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you

podUsage = resource.NewQuantity(int64(*podStats.EphemeralStorage.UsedBytes), resource.BinarySI)
}

message += fmt.Sprintf("Pod %s was using %s, request is %s, has larger consumption of %v. ", pod.Name, podUsage.String(), podRequest.String(), resourceToReclaim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought. Consider a Pod that uses pod-level resources and contains three containers.

apiVersion: v1
kind: Pod
metadata:
  name: evicted
spec:
  resources:
    requests:
      memory: 500Mi
  initContainers:
  - name: sidecar1
     image: registry.k8s.io/e2e-test-images/agnhost:2.55
     command: ["sleep", "infinity"]
  - name: sidecar2
     image: registry.k8s.io/e2e-test-images/agnhost:2.55
     command: ["sleep", "infinity"]
  containers:
  - name: regular
     image: registry.k8s.io/e2e-test-images/agnhost:2.55
     command: ["sleep", "infinity"]
     resources:
       requests:
         memory: 300Mi

If an eviction occurs due to memory usage from sidecar1 or sidecar2, I feel like the current event and annotation alone doesn’t clearly indicate which container actually caused it. If pod-level resources are specified and none of the containers that have memory requests exceed their requested memory, it might be helpful to record the memory usage of the container without a memory request that used the most memory. That way, we can at least get a hint about which container likely triggered the eviction. Though, maybe this is overkill and not really necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After getting the pod level resource stats, the function still continues and iterates all containers (like it used to do), so the user would get something like this:

  Warning  Evicted    49s   kubelet            The node was low on resource: memory. Threshold quantity: 80996576029, available: 75619264Ki. Pod stressor-sidecar-pod was using 12981880Ki, request is 1G, has larger consumption of memory. Container sidecar-stressor-container was using 10141636Ki, request is 0, has larger consumption of memory.
  Normal   Killing    49s   kubelet            Stopping container sidecar-stressor-container

It shows the comparison between the whole pod usage against its pod level request, along the usage and request of each container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I didn't take into account that when requests isn't specified, it always shows the actual usage along with a message saying it exceeds zero.

@bart0sh bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Jun 15, 2025
@KevinTMtz KevinTMtz force-pushed the pod-level-resources-eviction-manager branch from 4a2fd75 to dec7b9b Compare June 16, 2025 19:00
@@ -39,16 +42,23 @@ func GetResourceRequestQuantity(pod *v1.Pod, resourceName v1.ResourceName) resou
requestQuantity = resource.Quantity{Format: resource.DecimalSI}
}

for _, container := range pod.Spec.Containers {
if rQuantity, ok := container.Resources.Requests[resourceName]; ok {
// Supported pod level resources will be used instead of container level ones when available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetResourceRequestQuantity is called in

requestQuantity := GetResourceRequestQuantity(pod, resource)

which is used in premption and allocation manager related files. Can we please add the e2e coverage for preemption logic as well along with eviction manager related e2e...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you.

@ndixita ndixita moved this from Triage to PRs Waiting on Author in SIG Node CI/Test Board Jun 17, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: KevinTMtz
Once this PR has been reviewed and has the lgtm label, please assign mrunalp, thockin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 20, 2025
@@ -147,6 +147,66 @@ var _ = SIGDescribe("CriticalPod", framework.WithSerial(), framework.WithDisrupt
})
})

var _ = SIGDescribe("CriticalPodWithPodLevelResources", framework.WithSerial(), framework.WithDisruptive(), feature.PodLevelResources, func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the first addition of Node E2E tests related to Pod Level Resources feature, kubernetes/test-infra#35061 seems necessary to pass presubmit job (pull-kubernetes-node-e2e-containerd-alpha-features).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: PRs Waiting on Author
Development

Successfully merging this pull request may close these issues.

Add Support for Pod-Level Resources in eviction manager
4 participants