Skip to content

[FG:InPlacePodVerticalScaling] Prioritize resize requests by priorityClass and qos class #132342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

natasha41575
Copy link
Contributor

@natasha41575 natasha41575 commented Jun 16, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Prioritize resize requests by priorityClass and qos class when there is not enough room on the node to accept all the resize requests.

Link to design discussion: kubernetes/enhancements#5266

Which issue(s) this PR is related to:

Fixes #116971

Special notes for your reviewer:

This PR builds on #131612 (the first commit here is all the changes in #131612).

Does this PR introduce a user-facing change?

Prioritize resize requests by priorityClass and qos class when there is not enough room on the node to accept all the resize requests.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 16, 2025
@k8s-ci-robot k8s-ci-robot requested review from bart0sh and ffromani June 16, 2025 22:01
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: natasha41575
Once this PR has been reviewed and has the lgtm label, please assign sergeykanzhelev for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 16, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 16, 2025
@natasha41575 natasha41575 force-pushed the prioritized_resizes branch from 44f82ba to 05a324b Compare June 20, 2025 17:15
@natasha41575 natasha41575 marked this pull request as ready for review June 20, 2025 17:17
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 20, 2025
@k8s-ci-robot k8s-ci-robot requested review from dims and kannon92 June 20, 2025 17:17
@natasha41575
Copy link
Contributor Author

/triage accepted
/priority important-soon
/assign @tallclair

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 20, 2025
@k8s-ci-robot
Copy link
Contributor

@natasha41575: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-unit-windows-master 05a324b link false /test pull-kubernetes-unit-windows-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran out of time, will take another pass tomorrow.

@@ -47,6 +53,12 @@ const (
actuatedPodsStateFile = "actuated_pods_state"
)

var (
// ticker is used to periodically retry pending resizes.
ticker = time.NewTicker(retryPeriod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ticker should be a member of the manager struct, not a global

}

oldResizeStatus := m.statusManager.GetPodResizeConditions(uid)
defer func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a defer? I can't quite remember... was this to avoid a deadlock? If so, leave a comment to that effect.

kl.statusManager.SetPodResizeInProgressCondition(pod.UID, v1.PodReasonError, r.Message, false)
if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) {
for _, r := range result.SyncResults {
if r.Action == kubecontainer.ResizePodInPlace {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the resize restart policy is RestartContainer, then the sync action won't be ResizePodInPlace, but it could still result in the resize being actuated.

return
}
var podStatus *kubecontainer.PodStatus
podStatus, err = m.podcache.Get(pod.UID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod status is used here to determine if containers are running, which in turn determines whether to evaluate the container for deciding whether the resize is in-progress. There is the potential for a race condition here, where the allocation manager sets the condition one way, but by the time it's synced the container status has changed.

Since the status is only ever written from within SyncPod, can we just move all handling of the InProgress condition into SyncPod?

return false
}

if isResizeIncreasingAnyRequestsForContainer(allocatedPod.Spec.Resources, pod.Spec.Resources) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the pod-level resources feature gate here

return true
}

for i, c := range pod.Spec.Containers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to check sidecar containers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i, c := range pod.Spec.Containers {
for c, cType := range podutil.ContainerIter(pod.Spec, podutil.InitContainers|podutil.Containers) {
if !isResizableContainer(c, cType) {
continue
}


var oldCPURequests, newCPURequests, oldMemRequests, newMemRequests *apiresource.Quantity

if old != nil && old.Requests != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can simplify this method a lot. I think this should work, even if requests are null or CPU is missing:

if old.Requests.Cpu().Cmp(new.Requests.Cpu()) > 0 {
  return true
}

// - Second, based on the pod's PriorityClass.
// - Third, based on the pod's QoS class.
// - Last, prioritizing resizes that have been in the deferred state the longest.
func (m *manager) sortPendingPodsByPriority() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this isn't just sorting by the priority value on the pod. Maybe sortPendingResizes instead?

Suggested change
func (m *manager) sortPendingPodsByPriority() {
func (m *manager) sortPendingResizes() {

Comment on lines +318 to +323
if !firstPodIncreasing && secondPodIncreasing {
return true
}
if !secondPodIncreasing && firstPodIncreasing {
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If neither or increasing, the order doesn't really matter.

Suggested change
if !firstPodIncreasing && secondPodIncreasing {
return true
}
if !secondPodIncreasing && firstPodIncreasing {
return false
}
if !firstPodIncreasing {
return true
} else if !secondPodIncreasing {
return false
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

In place pod resizing should be designed into the kubelet config state loop, not alongside it
3 participants