Skip to content

Apiserver watch from storage without PrevKV option #131862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shadowofs
Copy link

@shadowofs shadowofs commented May 20, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Currently, the Kubernetes API server uses etcd’s PrevKV watch option to retrieve both the new and previous values of a key on each watch event. Under the hood, etcd performs a full range lookup to fetch the previous key-value pair, and all such operations contend on a single read–write lock protecting the internal treeIndex. As cluster scale grows, this coupling means that high write or watch load can starve other etcd requests, leading to increased watch-event latencies and overall pressure on etcd.

When watchCache is enabled in the API server, the only consumer of a watch event’s “previous” value is the Reflector, which uses it simply to identify and remove deleted objects from the cache. However, etcd watch events already include the full key of a deletion event, making the extra value lookup superfluous.

We validated this change on a 5,000‑node production cluster. Disabling PrevKV yielded ≈ 50 % reduction in MVCC range operations within etcd:
image

We also ran benchmarks in a test environment. Disabling PrevKV showed ≈ 20 % increase in overall API throughput:
image

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

a new featureGate `WatchFromStorageWithoutPrevKV` has been added to enable watches from etcd with no prev-kv

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 20, 2025
Copy link

linux-foundation-easycla bot commented May 20, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: shadowofs / name: Ying.S (b40d017)

@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 20, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @shadowofs!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 20, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @shadowofs. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 20, 2025
@k8s-ci-robot k8s-ci-robot requested review from aojea and cjcullen May 20, 2025 09:26
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 20, 2025
@shadowofs shadowofs force-pushed the feat/apiserver-watch-npkv branch 2 times, most recently from 05b79bc to df4c25e Compare May 20, 2025 11:55
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 20, 2025
@k8s-ci-robot
Copy link
Contributor

@shadowofs: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@shadowofs shadowofs force-pushed the feat/apiserver-watch-npkv branch from df4c25e to 2c0699d Compare May 20, 2025 12:44
@dims
Copy link
Member

dims commented May 20, 2025

cc @serathius

@xigang
Copy link
Member

xigang commented May 20, 2025

@shadowofs If WithPrevKV() is disabled, will it affect the behavior of the client-side Reflector? For example, will the Update callback function in the EventHandler still receive the previous data?

@dims
Copy link
Member

dims commented May 26, 2025

/retest

@serathius
Copy link
Contributor

Retest will not fix it.

=== RUN   TestAllocateReservedDynamicBlockExhausted
    storage_test.go:59: unexpected error creating etcd: watch prevkv disabled without reverseKeyFunc

@serathius
Copy link
Contributor

I wasn't sure if Allocator or master leases will be a problem here, because I didn't notice they setup a watch. Looks like they are. They are relevant because they store data in etcd under different key structure normal objects. Introducing a reverse key parsing opens a problem that logic for generating keys is spread around the codebase and that there is no guarantee that keys stored in a single storage follow the same schema.

Thinking whether before this PR we should structurize the keys used in storage.

@shadowofs
Copy link
Author

shadowofs commented May 27, 2025

=== RUN   TestAllocateReservedDynamicBlockExhausted
    storage_test.go:59: unexpected error creating etcd: watch prevkv disabled without reverseKeyFunc

Fixed it. I also added a WatchWithoutPrevKV flag to storage.ListOptions. This flag only takes effect for etcd storage, allowing etcd watch to explicitly disable PrevKV via this flag.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shadowofs
Once this PR has been reviewed and has the lgtm label, please ask for approval from wojtek-t. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shadowofs shadowofs requested review from serathius, wojtek-t and dims June 22, 2025 08:49
@xigang
Copy link
Member

xigang commented Jun 23, 2025

/test pull-kubernetes-e2e-kind-alpha-beta-features

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 24, 2025
@shadowofs shadowofs force-pushed the feat/apiserver-watch-npkv branch from e3ea326 to 663ad09 Compare June 24, 2025 09:52
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 24, 2025
@shadowofs shadowofs requested a review from serathius June 24, 2025 10:01
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 24, 2025
@shadowofs shadowofs force-pushed the feat/apiserver-watch-npkv branch from 663ad09 to d9f51b1 Compare June 24, 2025 11:51
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 24, 2025
@shadowofs shadowofs force-pushed the feat/apiserver-watch-npkv branch from d9f51b1 to b40d017 Compare July 1, 2025 09:26
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: other-sig (sig-node-approved)
Development

Successfully merging this pull request may close these issues.

7 participants