fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ #119105

sanposhiho · 2023-07-05T13:21:15Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR changes how the scheduling queue regards the scheduling failure.
We sometimes put the Pod back to the queue with the unschedulable plugins, and sometimes we don't.
The difference between these two cases is that the former is the rejection from the plugin, and the latter is the error from the plugin (or the scheduler, which shouldn't happen unless the scheduler has bugs).
More specifically, when the Pod is rejected in PreFilter, Filter, Reserve or Permit, the scheduler attached the failed plugin name to the Pod and put it back to the queue. And, when the Pod is rejected in another extension point (like Bind etc) which is probably because of the kube-apiserver failure (or the bug in the plugin impl), the scheduler put the pod to the queue without any plugin names.
In cases of the latter, it's more likely unexpected rejection and maybe it won't be resolved by cluster events.
This PR changes the scheduling queue to always put Pods without the failed plugins into activeQ/backoffQ so that we can prevent such Pods stuck in unschedQ forever.

Which issue(s) this PR fixes:

Ref: #118438 (comment)

Special notes for your reviewer:

Does this PR introduce a user-facing change?

More accurate requeueing in scheduling queue for Pods rejected by the temporal failure (e.g., temporal failure on kube-apiserver.)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2023-07-05T13:21:24Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sanposhiho · 2023-07-05T13:23:03Z

/hold

To involve other approvers.

/cc @alculquicondor

k8s-ci-robot · 2023-07-05T13:43:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [sanposhiho]
~~test/integration/scheduler/OWNERS~~ [sanposhiho]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Huang-Wei · 2023-07-05T17:28:00Z

pkg/scheduler/internal/queue/scheduling_queue_test.go

Are we able to reproduce the potential bug, and then verify the fix does work?

Sorry for the delay, I'll work on that this weekend.

I added TestRequeueByPermitRejection.
Please check out sanposhiho@e0cd7a0.
You can see a failed test result where I run the same test in the master branch.

alculquicondor · 2023-07-10T15:30:58Z

Also, what if there was no event, but the failure was an apiserver failure?

In this case, the pod should enter the active queue or backoff, but not the unschedulable queue.

sanposhiho · 2023-07-14T10:12:49Z

@alculquicondor I think that's a good point and could be another blocker of the removal of the flush because Pods, rejected due to the kube-apiserver failure, would stuck in unschedQ in a worst case if we didn't have a flush.

But, I'd say that's kind of difficult from the scheduler side to know whether the error is coming from the kube-apiserver unstable or not.. Because, in the first place, the scheduler cannot know which plugin in which extension point communicates with kube-apiserver.
What do you think? Do we need to create some special returning status so that plugin can let the scheduler know "this error is coming from kube-apiserver failure and the queue should requeue the pod into activeQ/backoffQ in any ways"?

alculquicondor · 2023-07-14T13:30:18Z

But, I'd say that's kind of difficult from the scheduler side to know whether the error is coming from the kube-apiserver unstable or not.

I don't think we need to make it specific to apiserver errors, but errors in general (as opposed to "Unschedulable(AndUnResolvable)" statuses).

sanposhiho · 2023-07-14T13:37:11Z

errors in general (as opposed to "Unschedulable(AndUnResolvable)" statuses).

Sounds good. Let me have an issue for this.

alculquicondor · 2023-07-14T14:03:43Z

why not fix in this PR, as it's highly related?

sanposhiho · 2023-09-06T21:58:02Z

We should put this in 1.28.1

Sure, will do after I'm back to the normal days - this Friday.

/unhold

Huang-Wei · 2023-09-07T00:22:28Z

pkg/scheduler/internal/queue/scheduling_queue.go

+=======
+ logger.V(5).Info("Pod moved to an internal scheduling queue", "pod", klog.KObj(pod), "event", ScheduleAttemptFailure, "queue", queue, "schedulingCycle", podSchedulingCycle)
+>>>>>>> dc313b9f0b9 (always put Pods with no unschedulable plugins into activeQ/backoffQ)


Please resolve this.

Oh, sorry, that's already fixed in the latest commit. I'll squash commits when doing rebase. 🙏

alculquicondor · 2023-09-07T12:07:18Z

oh, this is the PR I was thinking of yesterday, @pohly

sanposhiho · 2023-09-08T13:58:59Z

pkg/scheduler/internal/queue/scheduling_queue_test.go

@@ -3180,3 +3203,11 @@ func Test_isPodWorthRequeuing(t *testing.T) {
 })
 }
 }
+
+func mustAddUnschedulableIfNotPresent(t *testing.T, q *PriorityQueue, logger klog.Logger, pInfo *framework.QueuedPodInfo, podSchedulingCycle int64) {


To avoid ↓ in all places in pull-kubernetes-linter-hints.

ERROR: pkg/scheduler/internal/queue/scheduling_queue_test.go:1559:32: Error return value of `q.AddUnschedulableIfNotPresent` is not checked (errcheck) ERROR: q.AddUnschedulableIfNotPresent(logger, q.newQueuedPodInfo(highPriorityPodInfo.Pod, "plugin"), q.SchedulingCycle())

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/119105/pull-kubernetes-linter-hints/1700123530751905792

sanposhiho · 2023-09-08T15:14:54Z

@Huang-Wei @alculquicondor rebase done. 🙏

pkg/scheduler/internal/queue/scheduling_queue.go

pkg/scheduler/internal/queue/scheduling_queue_test.go

alculquicondor · 2023-09-08T15:39:53Z

pkg/scheduler/internal/queue/scheduling_queue_test.go

@@ -1325,31 +1333,44 @@ func TestPriorityQueue_MoveAllToActiveOrBackoffQueue(t *testing.T) {
 if p, err := q.Pop(); err != nil || p.Pod != hpp1 {
 t.Errorf("Expected: %v after Pop, but got: %v", hpp1, p.Pod.Name)
 }
+ unschedulableQueuedPodInfo := q.newQueuedPodInfo(unschedulablePodInfo.Pod, "fooPlugin")


This entire test is too hard to read (anyone reading has to keep a mental state of all that happened in the previous steps). I wonder if it can be split into different test cases that target specific transitions.
But that would be for a follow up.

👍
I'd like to do that refactor in a follow-up PR.
I guess somehow we can just merge this test and TestPriorityQueue_MoveAllToActiveOrBackoffQueueWithQueueingHint.

sanposhiho · 2023-09-09T09:38:35Z

@alculquicondor fixed in 9932089.
Let me know when it's ok to squash them.

alculquicondor · 2023-09-11T14:47:55Z

/lgtm
/label tide/merge-method-squash

k8s-ci-robot · 2023-09-11T14:48:02Z

LGTM label has been added.

Git tree hash: 81c9e753437a35019829cbae8a63a9804f897307

…into activeQ/backoffQ (kubernetes#119105) * always put Pods with no unschedulable plugins into activeQ/backoffQ * address review comments

k8s-ci-robot requested review from damemi and denkensk July 5, 2023 13:22

k8s-ci-robot requested a review from alculquicondor July 5, 2023 13:23

sanposhiho force-pushed the bind-failure branch from e085846 to 255782b Compare July 5, 2023 13:42

k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 5, 2023

sanposhiho mentioned this pull request Jul 5, 2023

feature(scheduling_queue): track events per Pods #118438

Merged

Huang-Wei reviewed Jul 5, 2023

View reviewed changes

sanposhiho force-pushed the bind-failure branch from 255782b to 7b0bd11 Compare July 17, 2023 16:03

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Sep 6, 2023

Huang-Wei reviewed Sep 7, 2023

View reviewed changes

pohly mentioned this pull request Sep 7, 2023

scheduler: avoid false "unschedulable" pod state #120334

Merged

sanposhiho force-pushed the bind-failure branch 2 times, most recently from d147da1 to b211bfa Compare September 8, 2023 12:26

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2023

k8s-ci-robot requested a review from alculquicondor September 8, 2023 12:26

k8s-ci-robot removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 8, 2023

sanposhiho force-pushed the bind-failure branch from b211bfa to c878dd2 Compare September 8, 2023 13:44

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 8, 2023

always put Pods with no unschedulable plugins into activeQ/backoffQ

4890598

sanposhiho force-pushed the bind-failure branch from c878dd2 to 4890598 Compare September 8, 2023 13:57

sanposhiho commented Sep 8, 2023

View reviewed changes

alculquicondor reviewed Sep 8, 2023

View reviewed changes

address review comments

9932089

k8s-ci-robot added tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 11, 2023

k8s-ci-robot merged commit 0d3eafd into kubernetes:master Sep 11, 2023
15 checks passed

k8s-ci-robot added this to the v1.29 milestone Sep 11, 2023

sanposhiho mentioned this pull request Sep 12, 2023

fix(scheduler_one): call Done() as soon as possible #120586

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ #119105

fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ #119105

sanposhiho commented Jul 5, 2023 •

edited

k8s-ci-robot commented Jul 5, 2023

sanposhiho commented Jul 5, 2023

k8s-ci-robot commented Jul 5, 2023

Huang-Wei Jul 5, 2023

sanposhiho Jul 14, 2023

sanposhiho Jul 17, 2023

alculquicondor commented Jul 10, 2023

sanposhiho commented Jul 14, 2023 •

edited

alculquicondor commented Jul 14, 2023

sanposhiho commented Jul 14, 2023

alculquicondor commented Jul 14, 2023

sanposhiho commented Sep 6, 2023

Huang-Wei Sep 7, 2023

sanposhiho Sep 7, 2023

alculquicondor commented Sep 7, 2023

sanposhiho Sep 8, 2023 •

edited

sanposhiho commented Sep 8, 2023

alculquicondor Sep 8, 2023

sanposhiho Sep 9, 2023

sanposhiho commented Sep 9, 2023 •

edited

alculquicondor commented Sep 11, 2023

k8s-ci-robot commented Sep 11, 2023

fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ #119105

fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ #119105

Conversation

sanposhiho commented Jul 5, 2023 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Jul 5, 2023

sanposhiho commented Jul 5, 2023

k8s-ci-robot commented Jul 5, 2023

Huang-Wei Jul 5, 2023

Choose a reason for hiding this comment

sanposhiho Jul 14, 2023

Choose a reason for hiding this comment

sanposhiho Jul 17, 2023

Choose a reason for hiding this comment

alculquicondor commented Jul 10, 2023

sanposhiho commented Jul 14, 2023 • edited

alculquicondor commented Jul 14, 2023

sanposhiho commented Jul 14, 2023

alculquicondor commented Jul 14, 2023

sanposhiho commented Sep 6, 2023

Huang-Wei Sep 7, 2023

Choose a reason for hiding this comment

sanposhiho Sep 7, 2023

Choose a reason for hiding this comment

alculquicondor commented Sep 7, 2023

sanposhiho Sep 8, 2023 • edited

Choose a reason for hiding this comment

sanposhiho commented Sep 8, 2023

alculquicondor Sep 8, 2023

Choose a reason for hiding this comment

sanposhiho Sep 9, 2023

Choose a reason for hiding this comment

sanposhiho commented Sep 9, 2023 • edited

alculquicondor commented Sep 11, 2023

k8s-ci-robot commented Sep 11, 2023

sanposhiho commented Jul 5, 2023 •

edited

sanposhiho commented Jul 14, 2023 •

edited

sanposhiho Sep 8, 2023 •

edited

sanposhiho commented Sep 9, 2023 •

edited