NodeUnschedulable: scheduler queueing hints #119396

wackxu · 2023-07-18T11:31:05Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Part of #118893

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kube-scheduler implements scheduling hints for the NodeUnschedulable plugin.
The scheduling hints allow the scheduler to only retry scheduling a Pod
that was previously rejected by the NodeSchedulable plugin if a new Node or a Node update sets .spec.unschedulable to false.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2023-07-18T11:31:14Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wackxu · 2023-07-18T13:08:44Z

/assign @sanposhiho

sanposhiho

Sorry for delay 🙏 (summer holidays + other prioritized tasks)
Looks good overall, just leave several comments to make them cleaner.

pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable.go

pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable_test.go

sanposhiho

/lgtm

@kubernetes/sig-scheduling-leads @kerthcet
Can someone take another review for /approve? (the bot assigned me to both reviewer/approver 😓 )

k8s-ci-robot · 2023-09-13T01:05:51Z

LGTM label has been added.

Git tree hash: befa96a3d596786b219342710518e3eb58ba4ea7

kerthcet · 2023-09-13T06:22:06Z

pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable.go

+ return framework.QueueAfterBackoff
+ }
+
+ originalNodeSchedulable, modifiedNodeSchedulable := false, !modifiedNode.Spec.Unschedulable


I think we don't need to care about the originalNodeSchedulable for it's always false, if it's true, we'll pass the nodeUnschedulable plugin, which makes the logic simpler.

We need originalNodeSchedulable.
Let's say NodeA.Spec.Unschedulable=false and NodeB.Spec.Unschedulable=true, and Pod is failed by this plugin.
In this case, in thisisSchedulableAfterNodeChange, we should ignore all changes to NodeA because NodeA is not related to this plugin's failure. What we need to care about is the change to NodeB only. But, if we don't have originalNodeSchedulable, we always return QueueAfterBackoff for all changes to NodeA.

So, in order to filter out such non-related events for NodeA, we need to have originalNodeSchedulable to return QueueAfterBackoff only when NodeB.Spec.Unschedulable=true → NodeB.Spec.Unschedulable=false.

why would be return QueueAfterBackoff if nodeB.Spec.Unchedulable=true?

I mean QueueAfterBackoff should be returned only when NodeB gets changed from Unschedulable=true to Unschedulable=false.

Got it, thanks for the explanation.

I still don't understand why we need to restrict returning QueueAfterBackoff to the transition.

I get that, in general, it shouldn't matter: once there is a transition to Unschedulable=false, the plugin would only return Success, so this function shouldn't be called.

But before #120334, we would call this function again. I think it's actually safer not to restrict passing this check to transitions only.

Let's say there is only one Node with Unschedulable=true in the cluster but almost all other Nodes are Unschedulable=false. In that case, all Pods would get NodeUnschedulable in the unschedulable plugins.
And, if we didn't restrict returning QueueAfterBackoff to the transition and returned QueueAfterBackoff to all Unschedulable=false Nodes, we would keep requeueing rejected Pods to activeQ/backoffQ with tons of unrelated events to unrelated Nodes (Unschedulable=false from the first). What we should observe is the event to the Node which is/was Unschedulable=true only.

But before #120334, we would call this function again. I think it's actually safer not to restrict passing this check to transitions only.

So... yes. If we want to consider unknown bugs like #120334 which aren't found yet, we should make all plugins to be more conservative, returning QueueAfterBackoff not only to the transition. But, I'm not sure if it's worth doing for unknown bugs.
Like NodeUnschedulable, many Node level scheduling constraint (like NodeTaint, NodeAffinity...etc), the same thing would happen -- like we need to return QueueAfterBackoff to changes to all untainted Nodes, we need to return QueueAfterBackoff to changes to all match Nodes, etc.

I see. Thanks for the additional context.
Let's pass the check only for transitions then.

kerthcet · 2023-09-13T06:24:53Z

pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable.go

+ return framework.QueueAfterBackoff
+ }
+
+ podToleratesUnschedulable := v1helper.TolerationsTolerateTaint(pod.Spec.Tolerations, &v1.Taint{


I think the right check order should be:

if newNode.spec.unschedulable == False { return queueAfterBackoff }else { verifyTolerations() }

Because when the node.spec.unschedulable is false, we no longer need to check the taint.

+1 Always do the faster calculations first

I removed the logic to check the taint, because Pod.Spec.Tolerations is an immutable field and this plugin never rejects Pod with Toleration for Unschedulable taint. So I think we do not need this logic

kerthcet · 2023-09-13T06:28:33Z

@kubernetes/sig-scheduling-leads @kerthcet Can someone take another review for /approve? (the bot assigned me to both reviewer/approver 😓 )

Can we have a sig-scheduling-approvers? Kensei always @ me extraly and I guess it's normal to @kubernetes/sig-scheduling-approvers for kubefolks.

alculquicondor · 2023-09-13T15:33:33Z

Feel free to send a PR https://github.com/kubernetes/org/blob/main/config/kubernetes/sig-scheduling/teams.yaml

sanposhiho · 2023-09-14T00:35:42Z

Can we have a sig-scheduling-approvers?

+100, let me create the PR.

Signed-off-by: wackxu <[email protected]>

kerthcet

/lgtm
/approve
/hold for @sanposhiho

k8s-ci-robot · 2023-09-14T03:43:47Z

LGTM label has been added.

Git tree hash: cee7efc4c0b0d4a233f4d9c8f573f707e63e062a

kerthcet · 2023-09-14T03:43:58Z

/retest

sanposhiho

/lgtm
/approve
/unhold

Thanks!

k8s-ci-robot · 2023-09-14T15:12:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kerthcet, sanposhiho, wackxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [kerthcet,sanposhiho]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wackxu · 2023-09-15T01:21:56Z

Thank you for your patience review and guidance @sanposhiho @alculquicondor @kerthcet

alculquicondor · 2023-10-18T20:26:13Z

/release-note-edit

kube-scheduler implements scheduling hints for the NodeUnschedulable plugin.
The scheduling hints allow the scheduler to only retry scheduling a Pod
that was previously rejected by the NodeSchedulable plugin if a new Node or a Node update sets .spec.unschedulable to false.

k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 18, 2023

k8s-ci-robot requested review from chendave and sanposhiho July 18, 2023 11:32

wackxu force-pushed the NodeUnschedulableHintFunc branch from f5fe523 to ea00d6d Compare July 18, 2023 11:32

k8s-ci-robot assigned sanposhiho Jul 18, 2023

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 19, 2023

sanposhiho reviewed Aug 23, 2023

View reviewed changes

wackxu force-pushed the NodeUnschedulableHintFunc branch from ea00d6d to 2085427 Compare August 25, 2023 03:03

sanposhiho reviewed Sep 13, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 13, 2023

kerthcet reviewed Sep 13, 2023

View reviewed changes

sanposhiho mentioned this pull request Sep 14, 2023

Add sig-scheduling-approvers and cleanup teams kubernetes/org#4463

Merged

wackxu force-pushed the NodeUnschedulableHintFunc branch from 2085427 to 5f5eb64 Compare September 14, 2023 01:58

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 14, 2023

k8s-ci-robot requested a review from sanposhiho September 14, 2023 01:58

wackxu force-pushed the NodeUnschedulableHintFunc branch 2 times, most recently from db7ef9c to c123e1b Compare September 14, 2023 02:15

scheduler/NodeUnschedulable: reduce pod scheduling latency

28dbe8a

Signed-off-by: wackxu <[email protected]>

wackxu force-pushed the NodeUnschedulableHintFunc branch from c123e1b to 28dbe8a Compare September 14, 2023 03:12

kerthcet reviewed Sep 14, 2023

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 14, 2023

k8s-ci-robot assigned kerthcet Sep 14, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 14, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 14, 2023

sanposhiho reviewed Sep 14, 2023

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 14, 2023

k8s-ci-robot merged commit fc786dc into kubernetes:master Sep 14, 2023
14 checks passed

k8s-ci-robot added this to the v1.29 milestone Sep 14, 2023

wackxu deleted the NodeUnschedulableHintFunc branch September 15, 2023 01:22

This was referenced Sep 22, 2023

Reevaluate flushing unschedulable pods into activeQ #87850

Open

nodeaffinity: scheduler queueing hints #119155

Merged

sanposhiho mentioned this pull request Oct 9, 2023

add(KEP-4247): Per-plugin callback functions for efficient enqueueing in the scheduling queue kubernetes/enhancements#4256

Merged

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Oct 18, 2023

sanposhiho mentioned this pull request Dec 5, 2023

[Umbrella] Implement QueueingHintFn in in-tree plugins #118893

Open

12 tasks

Vyom-Yadav mentioned this pull request Dec 13, 2023

NodeAffinity/NodeUnschedulable QueueingHint may miss Node related events that make Pod schedulable #122284

Closed

carlory mentioned this pull request Dec 15, 2023

NodeUnschedulable: scheduler queueing hints #122334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NodeUnschedulable: scheduler queueing hints #119396

NodeUnschedulable: scheduler queueing hints #119396

wackxu commented Jul 18, 2023 •

edited by k8s-ci-robot

k8s-ci-robot commented Jul 18, 2023

wackxu commented Jul 18, 2023

sanposhiho left a comment

sanposhiho left a comment

k8s-ci-robot commented Sep 13, 2023

kerthcet Sep 13, 2023

sanposhiho Sep 13, 2023 •

edited

alculquicondor Sep 13, 2023

sanposhiho Sep 14, 2023 •

edited

kerthcet Sep 14, 2023

alculquicondor Sep 18, 2023

sanposhiho Sep 19, 2023

alculquicondor Sep 19, 2023

kerthcet Sep 13, 2023

alculquicondor Sep 13, 2023

wackxu Sep 14, 2023

kerthcet commented Sep 13, 2023

alculquicondor commented Sep 13, 2023

sanposhiho commented Sep 14, 2023

kerthcet left a comment

k8s-ci-robot commented Sep 14, 2023

kerthcet commented Sep 14, 2023

sanposhiho left a comment

k8s-ci-robot commented Sep 14, 2023

wackxu commented Sep 15, 2023

alculquicondor commented Oct 18, 2023

NodeUnschedulable: scheduler queueing hints #119396

NodeUnschedulable: scheduler queueing hints #119396

Conversation

wackxu commented Jul 18, 2023 • edited by k8s-ci-robot

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Jul 18, 2023

wackxu commented Jul 18, 2023

sanposhiho left a comment

Choose a reason for hiding this comment

sanposhiho left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 13, 2023

Choose a reason for hiding this comment

sanposhiho Sep 13, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanposhiho Sep 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kerthcet commented Sep 13, 2023

alculquicondor commented Sep 13, 2023

sanposhiho commented Sep 14, 2023

kerthcet left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 14, 2023

kerthcet commented Sep 14, 2023

sanposhiho left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 14, 2023

wackxu commented Sep 15, 2023

alculquicondor commented Oct 18, 2023

wackxu commented Jul 18, 2023 •

edited by k8s-ci-robot

sanposhiho Sep 13, 2023 •

edited

sanposhiho Sep 14, 2023 •

edited