Increase leader-election nominal concurrency shares from 10 to 40 #129646

MikeSpreitzer · 2025-01-15T17:33:55Z

What type of PR is this?

/kind feature
Actually, kind of hard to classify. This is a tweak to an existing feature, based on results from experience and non-CI testing.

What this PR does / why we need it:

This PR increases the default setting for the nominal concurrency shares of the leader-election priority level in the API Priority and Fairness feature. Experience shows that in some stressful situations APF gives too little concurrency to the leader election requests, which causes the corresponding controller(s) to go out of business (and, due to the stress, a replacement has trouble getting up and elected). This PR changes the default setting of that level's nominal concurrency shares from 10 to 40.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The default nominal concurrency shares of the "leader-election" priority level in the API Priority and Fairness feature has been changed from 10 to 40.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2025-01-15T17:34:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MikeSpreitzer
Once this PR has been reviewed and has the lgtm label, please assign smarterclayton for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

staging/src/k8s.io/apiserver/pkg/apis/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

MikeSpreitzer · 2025-01-15T17:35:48Z

/cc @linxiulei

MikeSpreitzer · 2025-01-15T17:37:15Z

/sig api-machinery

k8s-triage-robot · 2025-01-15T17:56:10Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

Because some testing shows leader election being starved. Signed-off-by: Mike Spreitzer <[email protected]>

fedebongio · 2025-01-16T21:17:21Z

/triage accepted

MikeSpreitzer · 2025-02-10T20:43:47Z

PRs to compare:

Add weighting to APF concurrency borrowing #128087: weighted borrowing
Events+weighted borrowing #128974: weighted borrowing + segregated events (graphs also included more shares for leader-election)
Events + weighted borrowing + faster seat demand smoothing #129337: weighted borrowing + segregated events + faster seat demand decay (graphs also included more shares for leader-election)
Increase leader-election nominal concurrency shares from 10 to 40 #129646: more shares for leader-election

Also see:

https://docs.google.com/document/d/1kuQ2STE2mB_XFhWNxqkrx5wCh8o3SHZ6BUzWTuBUsps , which compares kube 1.31 normal and re-jiggered nominal concurrency shares
https://docs.google.com/document/d/1ouKtwLIEJSL0qgpUnOouhKH6oMdlB7JpLkEzsQIhdL4 , which looks at re-jiggered nominal concurrency shares plus segregating events

MikeSpreitzer · 2025-02-12T16:32:56Z

@linxiulei compared two runs with and without tweaking the nominal concurrency shares of the leader-election priority level. I have graphed the results, see https://docs.google.com/document/d/1DEexuOLOTmtNDAcAgnSOlVGsDXcq9f5_f_V77qYB3ME and https://docs.google.com/document/d/1Bz4t6A_H1z-jmsOL-Fa05EW9e0DL1uvtb9JzqwXw3bY

wojtek-t · 2025-02-24T09:45:24Z

staging/src/k8s.io/apiserver/pkg/apis/flowcontrol/bootstrap/default.go

@@ -208,7 +208,7 @@ var (
 		flowcontrol.PriorityLevelConfigurationSpec{
 			Type: flowcontrol.PriorityLevelEnablementLimited,
 			Limited: &flowcontrol.LimitedPriorityLevelConfiguration{
-				NominalConcurrencyShares: ptr.To(int32(10)),
+				NominalConcurrencyShares: ptr.To(int32(40)),


Hmm - with this change we're going up from default total of 245 to 275 without changing the lendable percent.
So technically, for usecases that don't need it, we effectively decrease the capacity by 10%.

Shouldn't we also adjust the LendablePercent here?

Interesting suggestion. I have added a commit that increases the lendable percent to 75, thus keeping the unlendable shares at 10 like before this PR. There is a small difference due to the increase in total nominal shares, but I suspect that will be unimportant.

Thank you!

There is a small difference due to the increase in total nominal shares, but I suspect that will be unimportant.

I looked into those numbers - there are changes, for typical values that we use in GKE, I see differences of actually available seats <15%. This isn't negligible, but I also don't know how to assess the risk of such a change.

With borrowing, in overloaded cluster I think that the risk will be mitigated, but I think we need ability to disable this change - so I would like to hide this change behind a feature-gate (though it can be enabled by default).

Remember that we have two other config changes being considered now:

Segregating events (introduced in Events+weighted borrowing #128974);

rejiggering to stop working around the lack of borrowing ([APF] low priorities have larger effective shares than high priorities #121982 (comment))

.. so that it takes no larger bite from system capacity than before (modulo small difference due to total nominal shares).

k8s-triage-robot · 2025-05-27T17:19:58Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2025-06-26T17:25:50Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-ci-robot requested review from caesarxuchao and wojtek-t January 15, 2025 17:34

k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jan 15, 2025

k8s-ci-robot requested a review from linxiulei January 15, 2025 17:35

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 15, 2025

Increase leader-election nominal concurrency shares from 10 to 40

e22f46e

Because some testing shows leader election being starved. Signed-off-by: Mike Spreitzer <[email protected]>

MikeSpreitzer force-pushed the more-leader-election-concurrency branch from 092a634 to e22f46e Compare January 15, 2025 20:20

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 16, 2025

This was referenced Feb 10, 2025

Events+weighted borrowing #128974

Open

Events + weighted borrowing + faster seat demand smoothing #129337

Open

Add weighting to APF concurrency borrowing #128087

Open

wojtek-t reviewed Feb 24, 2025

View reviewed changes

wojtek-t self-assigned this Feb 24, 2025

Increase leader-election's lendable percent

91148fe

.. so that it takes no larger bite from system capacity than before (modulo small difference due to total nominal shares).

MikeSpreitzer mentioned this pull request Feb 26, 2025

Rejigger API Priority and Fairness config #130459

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 27, 2025

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase leader-election nominal concurrency shares from 10 to 40 #129646

Increase leader-election nominal concurrency shares from 10 to 40 #129646

MikeSpreitzer commented Jan 15, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jan 15, 2025

Uh oh!

MikeSpreitzer commented Jan 15, 2025

Uh oh!

MikeSpreitzer commented Jan 15, 2025

Uh oh!

k8s-triage-robot commented Jan 15, 2025

Uh oh!

fedebongio commented Jan 16, 2025

Uh oh!

MikeSpreitzer commented Feb 10, 2025

Uh oh!

MikeSpreitzer commented Feb 12, 2025

Uh oh!

wojtek-t Feb 24, 2025

Uh oh!

MikeSpreitzer Feb 26, 2025

Uh oh!

wojtek-t Feb 26, 2025

Uh oh!

MikeSpreitzer Feb 26, 2025

Uh oh!

k8s-triage-robot commented May 27, 2025

Uh oh!

k8s-triage-robot commented Jun 26, 2025

Uh oh!

Uh oh!

Increase leader-election nominal concurrency shares from 10 to 40 #129646

Are you sure you want to change the base?

Increase leader-election nominal concurrency shares from 10 to 40 #129646

Conversation

MikeSpreitzer commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Jan 15, 2025

Uh oh!

MikeSpreitzer commented Jan 15, 2025

Uh oh!

MikeSpreitzer commented Jan 15, 2025

Uh oh!

k8s-triage-robot commented Jan 15, 2025

Uh oh!

fedebongio commented Jan 16, 2025

Uh oh!

MikeSpreitzer commented Feb 10, 2025

Uh oh!

MikeSpreitzer commented Feb 12, 2025

Uh oh!

wojtek-t Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

wojtek-t Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

MikeSpreitzer Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

k8s-triage-robot commented May 27, 2025

Uh oh!

k8s-triage-robot commented Jun 26, 2025

Uh oh!

Uh oh!

MikeSpreitzer commented Jan 15, 2025 •

edited

Loading