Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ #119105

Merged
merged 2 commits into from
Sep 11, 2023

Conversation

sanposhiho
Copy link
Member

@sanposhiho sanposhiho commented Jul 5, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR changes how the scheduling queue regards the scheduling failure.
We sometimes put the Pod back to the queue with the unschedulable plugins, and sometimes we don't.
The difference between these two cases is that the former is the rejection from the plugin, and the latter is the error from the plugin (or the scheduler, which shouldn't happen unless the scheduler has bugs).
More specifically, when the Pod is rejected in PreFilter, Filter, Reserve or Permit, the scheduler attached the failed plugin name to the Pod and put it back to the queue. And, when the Pod is rejected in another extension point (like Bind etc) which is probably because of the kube-apiserver failure (or the bug in the plugin impl), the scheduler put the pod to the queue without any plugin names.
In cases of the latter, it's more likely unexpected rejection and maybe it won't be resolved by cluster events.
This PR changes the scheduling queue to always put Pods without the failed plugins into activeQ/backoffQ so that we can prevent such Pods stuck in unschedQ forever.

Which issue(s) this PR fixes:

Ref: #118438 (comment)

Special notes for your reviewer:

Does this PR introduce a user-facing change?

More accurate requeueing in scheduling queue for Pods rejected by the temporal failure (e.g., temporal failure on kube-apiserver.)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 5, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 5, 2023
@sanposhiho
Copy link
Member Author

/hold

To involve other approvers.

/cc @alculquicondor

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jul 5, 2023
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 5, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to reproduce the potential bug, and then verify the fix does work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I'll work on that this weekend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added TestRequeueByPermitRejection.
Please check out sanposhiho@e0cd7a0.
You can see a failed test result where I run the same test in the master branch.

@alculquicondor
Copy link
Member

Also, what if there was no event, but the failure was an apiserver failure?

In this case, the pod should enter the active queue or backoff, but not the unschedulable queue.

@sanposhiho
Copy link
Member Author

sanposhiho commented Jul 14, 2023

@alculquicondor I think that's a good point and could be another blocker of the removal of the flush because Pods, rejected due to the kube-apiserver failure, would stuck in unschedQ in a worst case if we didn't have a flush.

But, I'd say that's kind of difficult from the scheduler side to know whether the error is coming from the kube-apiserver unstable or not.. Because, in the first place, the scheduler cannot know which plugin in which extension point communicates with kube-apiserver.
What do you think? Do we need to create some special returning status so that plugin can let the scheduler know "this error is coming from kube-apiserver failure and the queue should requeue the pod into activeQ/backoffQ in any ways"?

@alculquicondor
Copy link
Member

But, I'd say that's kind of difficult from the scheduler side to know whether the error is coming from the kube-apiserver unstable or not.

I don't think we need to make it specific to apiserver errors, but errors in general (as opposed to "Unschedulable(AndUnResolvable)" statuses).

@sanposhiho
Copy link
Member Author

errors in general (as opposed to "Unschedulable(AndUnResolvable)" statuses).

Sounds good. Let me have an issue for this.

@alculquicondor
Copy link
Member

why not fix in this PR, as it's highly related?

@sanposhiho
Copy link
Member Author

We should put this in 1.28.1

Sure, will do after I'm back to the normal days - this Friday.

/unhold

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Sep 6, 2023
Comment on lines 740 to 742
=======
logger.V(5).Info("Pod moved to an internal scheduling queue", "pod", klog.KObj(pod), "event", ScheduleAttemptFailure, "queue", queue, "schedulingCycle", podSchedulingCycle)
>>>>>>> dc313b9f0b9 (always put Pods with no unschedulable plugins into activeQ/backoffQ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, that's already fixed in the latest commit. I'll squash commits when doing rebase. 🙏

@alculquicondor
Copy link
Member

oh, this is the PR I was thinking of yesterday, @pohly

@sanposhiho sanposhiho force-pushed the bind-failure branch 2 times, most recently from d147da1 to b211bfa Compare September 8, 2023 12:26
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2023
@k8s-ci-robot k8s-ci-robot removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 8, 2023
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 8, 2023
@@ -3180,3 +3203,11 @@ func Test_isPodWorthRequeuing(t *testing.T) {
})
}
}

func mustAddUnschedulableIfNotPresent(t *testing.T, q *PriorityQueue, logger klog.Logger, pInfo *framework.QueuedPodInfo, podSchedulingCycle int64) {
Copy link
Member Author

@sanposhiho sanposhiho Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid ↓ in all places in pull-kubernetes-linter-hints.

ERROR: pkg/scheduler/internal/queue/scheduling_queue_test.go:1559:32: Error return value of `q.AddUnschedulableIfNotPresent` is not checked (errcheck)
ERROR: 	q.AddUnschedulableIfNotPresent(logger, q.newQueuedPodInfo(highPriorityPodInfo.Pod, "plugin"), q.SchedulingCycle())

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/119105/pull-kubernetes-linter-hints/1700123530751905792

@sanposhiho
Copy link
Member Author

@Huang-Wei @alculquicondor rebase done. 🙏

pkg/scheduler/internal/queue/scheduling_queue.go Outdated Show resolved Hide resolved
pkg/scheduler/internal/queue/scheduling_queue_test.go Outdated Show resolved Hide resolved
@@ -1325,31 +1333,44 @@ func TestPriorityQueue_MoveAllToActiveOrBackoffQueue(t *testing.T) {
if p, err := q.Pop(); err != nil || p.Pod != hpp1 {
t.Errorf("Expected: %v after Pop, but got: %v", hpp1, p.Pod.Name)
}
unschedulableQueuedPodInfo := q.newQueuedPodInfo(unschedulablePodInfo.Pod, "fooPlugin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire test is too hard to read (anyone reading has to keep a mental state of all that happened in the previous steps). I wonder if it can be split into different test cases that target specific transitions.
But that would be for a follow up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
I'd like to do that refactor in a follow-up PR.
I guess somehow we can just merge this test and TestPriorityQueue_MoveAllToActiveOrBackoffQueueWithQueueingHint.

@sanposhiho
Copy link
Member Author

sanposhiho commented Sep 9, 2023

@alculquicondor fixed in 9932089.
Let me know when it's ok to squash them.

@alculquicondor
Copy link
Member

/lgtm
/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 11, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 81c9e753437a35019829cbae8a63a9804f897307

@k8s-ci-robot k8s-ci-robot merged commit 0d3eafd into kubernetes:master Sep 11, 2023
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Sep 11, 2023
Sharpz7 pushed a commit to Sharpz7/kubernetes that referenced this pull request Oct 27, 2023
…into activeQ/backoffQ (kubernetes#119105)

* always put Pods with no unschedulable plugins into activeQ/backoffQ

* address review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants