-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeUnschedulable: scheduler queueing hints #119396
NodeUnschedulable: scheduler queueing hints #119396
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
f5fe523
to
ea00d6d
Compare
/assign @sanposhiho |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for delay 🙏 (summer holidays + other prioritized tasks)
Looks good overall, just leave several comments to make them cleaner.
pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable_test.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable_test.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable_test.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/nodeunschedulable/node_unschedulable_test.go
Show resolved
Hide resolved
ea00d6d
to
2085427
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@kubernetes/sig-scheduling-leads @kerthcet
Can someone take another review for /approve
? (the bot assigned me to both reviewer/approver 😓 )
LGTM label has been added. Git tree hash: befa96a3d596786b219342710518e3eb58ba4ea7
|
return framework.QueueAfterBackoff | ||
} | ||
|
||
originalNodeSchedulable, modifiedNodeSchedulable := false, !modifiedNode.Spec.Unschedulable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to care about the originalNodeSchedulable
for it's always false, if it's true, we'll pass the nodeUnschedulable plugin, which makes the logic simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need originalNodeSchedulable
.
Let's say NodeA.Spec.Unschedulable=false and NodeB.Spec.Unschedulable=true, and Pod is failed by this plugin.
In this case, in thisisSchedulableAfterNodeChange
, we should ignore all changes to NodeA because NodeA is not related to this plugin's failure. What we need to care about is the change to NodeB only. But, if we don't have originalNodeSchedulable
, we always return QueueAfterBackoff for all changes to NodeA.
So, in order to filter out such non-related events for NodeA, we need to have originalNodeSchedulable
to return QueueAfterBackoff only when NodeB.Spec.Unschedulable=true → NodeB.Spec.Unschedulable=false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would be return QueueAfterBackoff if nodeB.Spec.Unchedulable=true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean QueueAfterBackoff should be returned only when NodeB gets changed from Unschedulable=true to Unschedulable=false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why we need to restrict returning QueueAfterBackoff to the transition.
I get that, in general, it shouldn't matter: once there is a transition to Unschedulable=false, the plugin would only return Success, so this function shouldn't be called.
But before #120334, we would call this function again. I think it's actually safer not to restrict passing this check to transitions only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say there is only one Node with Unschedulable=true
in the cluster but almost all other Nodes are Unschedulable=false
. In that case, all Pods would get NodeUnschedulable in the unschedulable plugins.
And, if we didn't restrict returning QueueAfterBackoff to the transition and returned QueueAfterBackoff
to all Unschedulable=false
Nodes, we would keep requeueing rejected Pods to activeQ/backoffQ with tons of unrelated events to unrelated Nodes (Unschedulable=false
from the first). What we should observe is the event to the Node which is/was Unschedulable=true
only.
But before #120334, we would call this function again. I think it's actually safer not to restrict passing this check to transitions only.
So... yes. If we want to consider unknown bugs like #120334 which aren't found yet, we should make all plugins to be more conservative, returning QueueAfterBackoff not only to the transition. But, I'm not sure if it's worth doing for unknown bugs.
Like NodeUnschedulable, many Node level scheduling constraint (like NodeTaint, NodeAffinity...etc), the same thing would happen -- like we need to return QueueAfterBackoff to changes to all untainted Nodes, we need to return QueueAfterBackoff to changes to all match Nodes, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for the additional context.
Let's pass the check only for transitions then.
return framework.QueueAfterBackoff | ||
} | ||
|
||
podToleratesUnschedulable := v1helper.TolerationsTolerateTaint(pod.Spec.Tolerations, &v1.Taint{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the right check order should be:
if newNode.spec.unschedulable == False {
return queueAfterBackoff
}else {
verifyTolerations()
}
Because when the node.spec.unschedulable
is false, we no longer need to check the taint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Always do the faster calculations first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the logic to check the taint, because Pod.Spec.Tolerations is an immutable field and this plugin never rejects Pod with Toleration for Unschedulable taint. So I think we do not need this logic
Can we have a sig-scheduling-approvers? Kensei always @ me extraly and I guess it's normal to @kubernetes/sig-scheduling-approvers for kubefolks. |
Feel free to send a PR https://github.com/kubernetes/org/blob/main/config/kubernetes/sig-scheduling/teams.yaml |
+100, let me create the PR. |
2085427
to
5f5eb64
Compare
db7ef9c
to
c123e1b
Compare
Signed-off-by: wackxu <[email protected]>
c123e1b
to
28dbe8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold for @sanposhiho
LGTM label has been added. Git tree hash: cee7efc4c0b0d4a233f4d9c8f573f707e63e062a
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/unhold
Thanks!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kerthcet, sanposhiho, wackxu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you for your patience review and guidance @sanposhiho @alculquicondor @kerthcet |
/release-note-edit
|
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Part of #118893
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: