Skip to content

[FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop #131612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

natasha41575
Copy link
Contributor

@natasha41575 natasha41575 commented May 5, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This moves the in-place pod resize allocation logic out of the sync loop. This PR is organized into the following 5 commits:

  1. Untangle the HandlePodResourcesResize unit tests and move them into the allocation package
  2. Add helper methods IsPodResizeInfeasible and IsPodResizeDeferred to the status_manager.
  3. Update the allocation_manager methods to hold all the control logic required for handling pod resize allocation + update the kubelet to no longer attempt to allocate pod resizes in the sync loop (and update unit tests accordingly).
  4. Update the allocation manager's unit tests to cover PushPendingResizes and RetryPendingResizes
  5. Skip pending resize evaluation if sources aren't ready (per discussion on the previous PR)

The intention of this PR is to reattempt pending resizes:

  • whenever HandlePodAdditions or HandlePodUpdates receives a resize request that it didn't already have,
  • upon deletion of another pod,
  • upon the successful actuation of another resize,
  • or periodically. This PR sets a timer for every 3 minutes, but we should probably think about if that is the right amount of time.

Special notes for your reviewer

Intended follow-ups:

  1. This PR is required for but does not include implementation of prioritized resizes. That is because the PR was already getting a bit too large to review, and because design for prioritized resizes is still pending (KEP-1287: Priority of Resize Requests enhancements#5266). This is also useful as its own standalone change without having prioritized resizes yet, but I left a TODO for that.
  2. Some cleanup (such as moving some unit tests around, unexporting functions that no longer need to be exported, removing some code that's not needed anymore etc), I left some of these things out of this PR to keep the size down

Which issue(s) this PR fixes:

Does not yet fix it, but this is part of #116971.

Does this PR introduce a user-facing change?

NONE

/sig node
/priority important-soon
/triage accepted
/cc @tallclair

TODO:

  • retry deferred resizes in HandlePodCleanups
    • I don't think anything in HandlePodCleanups affects the admission decision (but I could be wrong)? It looks like the admission decision depends on the pod manager as the source of truth (through kl.podManager.GetPods), and the pod manager is not updated in HandlePodCleanups, so I don't think retrying the pending resizes here is necessary
  • double check the logic in HandlePodAdditions and HandlePodUpdates is correct (maybe add unit tests covering resize cases)
  • allocation manager unit tests
  • need to fix an issue where even when the resize is deferred and not allocated or actuated, the pod status is showing updated allocated and actual resources
  • need to fix an issue where a pending resize that gets reverted does not have its pending condition cleared quickly enough
  • sanity check with running this e2e locally
  • there seems to be more latency than should be necessary in accepting a pending resize after another pod is scaled down to make room, want to investigate this (but this doesn't necessarily have to be blocking)
  • skip retry of pending resizes if sources aren't ready (!kl.sourcesReady.AllReady())
  • rebase on move pod admission and resize logic into the allocation manager #131801

@k8s-ci-robot k8s-ci-robot requested a review from tallclair May 5, 2025 16:33
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 5, 2025
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 5, 2025
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 5, 2025
@natasha41575 natasha41575 changed the title Move resize logic [FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop May 5, 2025
@natasha41575 natasha41575 moved this from Triage to Work in progress in SIG Node: code and documentation PRs May 5, 2025
@natasha41575 natasha41575 moved this from Triage to Archive-it in SIG Node CI/Test Board May 5, 2025
@natasha41575 natasha41575 force-pushed the move-resize-logic branch 3 times, most recently from 433aa61 to d52210d Compare May 6, 2025 21:19
@natasha41575 natasha41575 force-pushed the move-resize-logic branch 2 times, most recently from 9b16320 to 42ececf Compare May 7, 2025 20:33
@natasha41575 natasha41575 marked this pull request as ready for review May 7, 2025 21:30
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2025
@SergeyKanzhelev SergeyKanzhelev moved this from Needs Approver to Needs Reviewer in SIG Node: code and documentation PRs May 31, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: natasha41575
Once this PR has been reviewed and has the lgtm label, please ask for approval from tallclair. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 2, 2025
@natasha41575 natasha41575 requested a review from tallclair June 2, 2025 22:37
@natasha41575
Copy link
Contributor Author

@tallclair this is rebased and I added skipping the pending resize evaluation when the sources are not ready

there's one more thing I want to check - it seems like there's still a little extra unnecessary latency after another pod is sized down before the space is considered "free" so that a pending resize can succeed, so I'm going to try to look into that, but other than that PR is ready for your review

@natasha41575
Copy link
Contributor Author

it seems like there's still a little extra unnecessary latency after another pod is sized down before the space is considered "free" so that a pending resize can succeed,

this is fixed now, it was just because of something dumb I did

natasha41575 added a commit to natasha41575/kubernetes that referenced this pull request Jun 20, 2025
natasha41575 added a commit to natasha41575/kubernetes that referenced this pull request Jun 20, 2025
@k8s-ci-robot
Copy link
Contributor

@natasha41575: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-unit-windows-master 91a33e4 link false /test pull-kubernetes-unit-windows-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished reviewing, please include the relevant comments from #132342 (review) too.

ticker.Reset(retryPeriod)

if !m.sourcesReady.AllReady() {
klog.V(4).InfoS("Skipping evaluation of pending resizes; sources are not ready")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an empty source, it will send an empty update to mark the source as seen. Will that trigger a retry here? Otherwise, we might want to use a shorter retryPeriod until all sources are ready, otherwise there's a risk of a resize getting stuck waiting the full 3 minutes (although pretty unlikely).

} else {
// We can hit this case if a pending resize has been reverted,
// so we need to clear the pending resize condition.
kl.allocationManager.ClearPodResizePendingCondition(pod.UID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should kubelet just call status manager directly here, rather than plumbing this through allocation manager? Alternatively, it might make more sense to call this "RemovePendingResize", which removes it from the queue and clears the pending condition (that's not necessary though)

Comment on lines +320 to +321
m.podStatusesLock.Lock()
defer m.podStatusesLock.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Rlock/RUnlock (same below)

Suggested change
m.podStatusesLock.Lock()
defer m.podStatusesLock.Unlock()
m.podStatusesLock.RLock()
defer m.podStatusesLock.RUnlock()

@@ -2655,6 +2671,14 @@ func (kl *Kubelet) HandlePodAdditions(pods []*v1.Pod) {
for _, update := range updates {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on the previous line no longer applies. I think you can just call UpdatePod directly in the previous loop now, since SyncPod doesn't resolve resizes anymore. Does that look correct?

// existing pods are added.
allocatedPod, updatedFromAllocation := kl.allocationManager.UpdatePodFromAllocation(pod)
if updatedFromAllocation {
pod = allocatedPod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this is unnecessary since UpdatePod handles it


if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) {
if resizeRequest {
kl.allocationManager.RetryPendingResizes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be moved to before the UpdatePod call? If the resize can be allocated immediately, it would be better to sync the pod with the resize first.

Ideally we'd avoid the double call to UpdatePod if the resize is immediately actuated, but I don't see a good way to do that right now. Maybe leave a TODO?

@k8s-ci-robot
Copy link
Contributor

@shiya0705: GitHub didn't allow me to request PR reviews from the following users: shiya0705.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Archive-it
Development

Successfully merging this pull request may close these issues.

3 participants