-
Notifications
You must be signed in to change notification settings - Fork 40.9k
[FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop #131612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
433aa61
to
d52210d
Compare
9b16320
to
42ececf
Compare
501626e
to
3eae312
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: natasha41575 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@tallclair this is rebased and I added skipping the pending resize evaluation when the sources are not ready there's one more thing I want to check - it seems like there's still a little extra unnecessary latency after another pod is sized down before the space is considered "free" so that a pending resize can succeed, so I'm going to try to look into that, but other than that PR is ready for your review |
this is fixed now, it was just because of something dumb I did |
6605d04
to
33db8ea
Compare
33db8ea
to
fdee11e
Compare
fdee11e
to
91a33e4
Compare
@natasha41575: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished reviewing, please include the relevant comments from #132342 (review) too.
ticker.Reset(retryPeriod) | ||
|
||
if !m.sourcesReady.AllReady() { | ||
klog.V(4).InfoS("Skipping evaluation of pending resizes; sources are not ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an empty source, it will send an empty update to mark the source as seen. Will that trigger a retry here? Otherwise, we might want to use a shorter retryPeriod until all sources are ready, otherwise there's a risk of a resize getting stuck waiting the full 3 minutes (although pretty unlikely).
} else { | ||
// We can hit this case if a pending resize has been reverted, | ||
// so we need to clear the pending resize condition. | ||
kl.allocationManager.ClearPodResizePendingCondition(pod.UID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should kubelet just call status manager directly here, rather than plumbing this through allocation manager? Alternatively, it might make more sense to call this "RemovePendingResize", which removes it from the queue and clears the pending condition (that's not necessary though)
m.podStatusesLock.Lock() | ||
defer m.podStatusesLock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Rlock/RUnlock (same below)
m.podStatusesLock.Lock() | |
defer m.podStatusesLock.Unlock() | |
m.podStatusesLock.RLock() | |
defer m.podStatusesLock.RUnlock() |
@@ -2655,6 +2671,14 @@ func (kl *Kubelet) HandlePodAdditions(pods []*v1.Pod) { | |||
for _, update := range updates { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on the previous line no longer applies. I think you can just call UpdatePod
directly in the previous loop now, since SyncPod doesn't resolve resizes anymore. Does that look correct?
// existing pods are added. | ||
allocatedPod, updatedFromAllocation := kl.allocationManager.UpdatePodFromAllocation(pod) | ||
if updatedFromAllocation { | ||
pod = allocatedPod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this is unnecessary since UpdatePod
handles it
|
||
if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) { | ||
if resizeRequest { | ||
kl.allocationManager.RetryPendingResizes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be moved to before the UpdatePod
call? If the resize can be allocated immediately, it would be better to sync the pod with the resize first.
Ideally we'd avoid the double call to UpdatePod
if the resize is immediately actuated, but I don't see a good way to do that right now. Maybe leave a TODO?
@shiya0705: GitHub didn't allow me to request PR reviews from the following users: shiya0705. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This moves the in-place pod resize allocation logic out of the sync loop. This PR is organized into the following 5 commits:
HandlePodResourcesResize
unit tests and move them into theallocation
packageIsPodResizeInfeasible
andIsPodResizeDeferred
to the status_manager.The intention of this PR is to reattempt pending resizes:
Special notes for your reviewer
Intended follow-ups:
Which issue(s) this PR fixes:
Does not yet fix it, but this is part of #116971.
Does this PR introduce a user-facing change?
/sig node
/priority important-soon
/triage accepted
/cc @tallclair
TODO:
retry deferred resizes in HandlePodCleanupskl.podManager.GetPods
), and the pod manager is not updated in HandlePodCleanups, so I don't think retrying the pending resizes here is necessary!kl.sourcesReady.AllReady()
)