-
Notifications
You must be signed in to change notification settings - Fork 40.9k
Increase leader-election nominal concurrency shares from 10 to 40 #129646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Increase leader-election nominal concurrency shares from 10 to 40 #129646
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: MikeSpreitzer The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @linxiulei |
/sig api-machinery |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
Because some testing shows leader election being starved. Signed-off-by: Mike Spreitzer <[email protected]>
092a634
to
e22f46e
Compare
/triage accepted |
PRs to compare:
Also see:
|
@linxiulei compared two runs with and without tweaking the nominal concurrency shares of the leader-election priority level. I have graphed the results, see https://docs.google.com/document/d/1DEexuOLOTmtNDAcAgnSOlVGsDXcq9f5_f_V77qYB3ME and https://docs.google.com/document/d/1Bz4t6A_H1z-jmsOL-Fa05EW9e0DL1uvtb9JzqwXw3bY |
@@ -208,7 +208,7 @@ var ( | |||
flowcontrol.PriorityLevelConfigurationSpec{ | |||
Type: flowcontrol.PriorityLevelEnablementLimited, | |||
Limited: &flowcontrol.LimitedPriorityLevelConfiguration{ | |||
NominalConcurrencyShares: ptr.To(int32(10)), | |||
NominalConcurrencyShares: ptr.To(int32(40)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - with this change we're going up from default total of 245 to 275 without changing the lendable percent.
So technically, for usecases that don't need it, we effectively decrease the capacity by 10%.
Shouldn't we also adjust the LendablePercent here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting suggestion. I have added a commit that increases the lendable percent to 75, thus keeping the unlendable shares at 10 like before this PR. There is a small difference due to the increase in total nominal shares, but I suspect that will be unimportant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There is a small difference due to the increase in total nominal shares, but I suspect that will be unimportant.
I looked into those numbers - there are changes, for typical values that we use in GKE, I see differences of actually available seats <15%. This isn't negligible, but I also don't know how to assess the risk of such a change.
With borrowing, in overloaded cluster I think that the risk will be mitigated, but I think we need ability to disable this change - so I would like to hide this change behind a feature-gate (though it can be enabled by default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember that we have two other config changes being considered now:
- Segregating events (introduced in Events+weighted borrowing #128974);
- rejiggering to stop working around the lack of borrowing ([APF] low priorities have larger effective shares than high priorities #121982 (comment))
.. so that it takes no larger bite from system capacity than before (modulo small difference due to total nominal shares).
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
What type of PR is this?
/kind feature
Actually, kind of hard to classify. This is a tweak to an existing feature, based on results from experience and non-CI testing.
What this PR does / why we need it:
This PR increases the default setting for the nominal concurrency shares of the
leader-election
priority level in the API Priority and Fairness feature. Experience shows that in some stressful situations APF gives too little concurrency to the leader election requests, which causes the corresponding controller(s) to go out of business (and, due to the stress, a replacement has trouble getting up and elected). This PR changes the default setting of that level's nominal concurrency shares from 10 to 40.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: