Estimate average size of objects in etcd and plug it into request cost estimator #132355

serathius · 2025-06-17T13:43:48Z

/kind feature

Added SizeBasedListCostEstimate feature gate that allows apiserver to estimate sizes of objects to calculate cost of LIST requests

This PR proposes to track object size in resource by caching last observed state for each key and then calculating average size of keys available in etcd. Fetching the list of keys from etcd is required to make sure we remove stale keys that were not deleted due skipped watch events. We could take list of keys from cacher to avoid WithKeysOnly request.

/assign @deads2k @wojtek-t

serathius · 2025-06-17T18:45:51Z

/retest

yongruilin · 2025-06-17T20:18:56Z

/triage accepted

serathius · 2025-06-23T14:37:07Z

Updated implementation to solve both issues:

automatically execute GetKeys periodically if it was not executed within last minute.
Use watch cache to get list of keys if enabled.

I didn't had time to polish the PR, but please review the interfaces cc @wojtek-t @deads2k

serathius · 2025-06-24T13:36:40Z

/retest

wojtek-t

Just a few comments - but overall this lgtm

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store_test.go

staging/src/k8s.io/apiserver/pkg/storage/cacher/watch_cache.go

staging/src/k8s.io/apiserver/pkg/storage/etcd3/stats.go

wojtek-t · 2025-06-25T11:06:12Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

@@ -792,6 +814,9 @@ func (s *store) GetList(ctx context.Context, key string, opts storage.ListOption

 			// being unable to set the version does not prevent the object from being extracted
 			if matched, err := opts.Predicate.Matches(obj); err == nil && matched {
+				if s.stats != nil {
+					s.stats.AddOrUpdate(string(kv.Key), kv.ModRevision, int64(len(kv.Value)))


I started wondering about the impact of this change on potential list latency (as well as the corresponding watch change on the watch throughput).

Recording the change in cache is fast, but if we want to acquire and release a lock for every single change here - do we have any benchmarks showing the impact?

Rewrote the updates to stats to handle all KVs at once instead of one by one to reduce locking.

Started doing benchmarks for List, preliminary results show 3-7% performance overhead. Pending estimation of watch overhead.

After rewrite it looks like the overhead is at max 5% when we paginate with 150`000 objects. As benchmark uses minimal object size this is worst case scenario.

$ benchstat -col /SizeBasedListCostEstimate bench5.txt │ true │ false │ │ sec/op │ sec/op vs base │ StoreList/Namespaces=10000/Pods=150000/Nodes=5000/RV=/Scope=Cluster/Paginate=0-24 212.3m ± 6% 204.8m ± 2% -3.56% (p=0.001 n=10) StoreList/Namespaces=10000/Pods=150000/Nodes=5000/RV=/Scope=Cluster/Paginate=1000-24 292.3m ± 2% 306.9m ± 1% +5.02% (p=0.000 n=10) StoreList/Namespaces=10000/Pods=150000/Nodes=5000/RV=/Scope=Node/Paginate=0-24 204.1m ± 1% 203.9m ± 3% ~ (p=0.912 n=10) StoreList/Namespaces=10000/Pods=150000/Nodes=5000/RV=/Scope=Namespace/Paginate=0-24 81.17µ ± 2% 83.82µ ± 1% +3.27% (p=0.000 n=10) StoreList/Namespaces=50/Pods=150000/Nodes=5000/RV=/Scope=Cluster/Paginate=0-24 206.9m ± 6% 208.5m ± 1% ~ (p=0.579 n=10) StoreList/Namespaces=50/Pods=150000/Nodes=5000/RV=/Scope=Cluster/Paginate=1000-24 320.3m ± 1% 325.7m ± 1% +1.69% (p=0.004 n=10) StoreList/Namespaces=50/Pods=150000/Nodes=5000/RV=/Scope=Node/Paginate=0-24 205.6m ± 3% 207.3m ± 5% ~ (p=0.684 n=10) StoreList/Namespaces=50/Pods=150000/Nodes=5000/RV=/Scope=Namespace/Paginate=0-24 4.034m ± 1% 4.030m ± 1% ~ (p=0.315 n=10) StoreList/Namespaces=50/Pods=150000/Nodes=5000/RV=/Scope=Namespace/Paginate=1000-24 3.752m ± 1% 3.703m ± 1% -1.28% (p=0.002 n=10) StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Cluster/Paginate=0-24 148.4m ± 2% 144.7m ± 3% -2.43% (p=0.019 n=10) StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Cluster/Paginate=1000-24 203.1m ± 1% 201.7m ± 1% ~ (p=0.063 n=10) StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Node/Paginate=0-24 147.6m ± 2% 145.1m ± 3% ~ (p=0.143 n=10) StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Node/Paginate=100-24 15.80 ± 1% 15.61 ± 1% -1.16% (p=0.011 n=10) StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Namespace/Paginate=0-24 1.434m ± 1% 1.419m ± 1% -1.02% (p=0.043 n=10) StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Namespace/Paginate=1000-24 1.430m ± 1% 1.418m ± 1% ~ (p=0.089 n=10) geomean 50.00m 49.95m -0.10%

Added the BenchmarkStatsCacheCleanKeys to observe the potential negative impact on watch latency during cleanup.

BenchmarkStatsCacheCleanKeys-24 55 21719938 ns/op 10496345 B/op 515 allocs/op BenchmarkStatsCacheCleanKeys-24 57 28318478 ns/op 10496350 B/op 515 allocs/op BenchmarkStatsCacheCleanKeys-24 62 22347929 ns/op 10496460 B/op 516 allocs/op

It takes 2-3ms to run cleanup for 150'000 keys.

OK - I think that's acceptable thanks!

…t estimator

wojtek-t · 2025-06-26T11:13:41Z

/lgtm
/approve

k8s-ci-robot · 2025-06-26T11:13:48Z

LGTM label has been added.

Git tree hash: 3a0607c1e6a4b1231533052ebe8be461a89ce067

k8s-ci-robot · 2025-06-26T11:13:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: serathius, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/features/OWNERS~~ [wojtek-t]
~~staging/src/k8s.io/apiserver/OWNERS~~ [wojtek-t]
~~test/compatibility_lifecycle/reference/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot assigned deads2k Jun 17, 2025

k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jun 17, 2025

k8s-ci-robot assigned wojtek-t Jun 17, 2025

serathius mentioned this pull request Jun 17, 2025

Plug size of objects stored into APF estimator #132292

Closed

k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 17, 2025

k8s-ci-robot requested review from aojea and caesarxuchao June 17, 2025 13:44

serathius changed the title ~~Apf estimate size~~ Estimate average size of objects in etcd and plug it into request cost estimator Jun 17, 2025

serathius force-pushed the apf_estimate_size branch 5 times, most recently from 9771f1c to f4ea6bd Compare June 17, 2025 17:50

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 17, 2025

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 17, 2025

serathius force-pushed the apf_estimate_size branch from f4ea6bd to 88bd225 Compare June 18, 2025 07:24

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 18, 2025

serathius force-pushed the apf_estimate_size branch from 88bd225 to 018bae7 Compare June 18, 2025 09:05

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2025

k8s-ci-robot requested review from deads2k and wojtek-t June 23, 2025 12:52

k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2025

serathius force-pushed the apf_estimate_size branch from de52fb4 to c0dc3e1 Compare June 23, 2025 14:35

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 23, 2025

serathius force-pushed the apf_estimate_size branch 2 times, most recently from 4bfdf01 to bdf3707 Compare June 23, 2025 21:21

serathius mentioned this pull request Jun 24, 2025

Tune APF list work estimator for large responses #132233

Open

serathius force-pushed the apf_estimate_size branch from bdf3707 to e9173b8 Compare June 24, 2025 10:03

wojtek-t reviewed Jun 25, 2025

View reviewed changes

serathius force-pushed the apf_estimate_size branch 3 times, most recently from 34edc05 to 67b52d7 Compare June 25, 2025 15:25

serathius added 3 commits June 25, 2025 17:25

Estimate average size of objects in etcd and plug it into request cos…

ec78b83

…t estimator

Run background cleanup goroutine

7cb2417

Override getKeys when cacher is enabled

e2c6b7f

serathius force-pushed the apf_estimate_size branch from 67b52d7 to 1d6409a Compare June 25, 2025 15:26

Add benchmark for SizeBasedListCostEstimate feature

1639b09

serathius force-pushed the apf_estimate_size branch from 1d6409a to 1639b09 Compare June 26, 2025 09:09

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2025

k8s-ci-robot merged commit 0256361 into kubernetes:master Jun 26, 2025
14 of 15 checks passed

k8s-ci-robot added this to the v1.34 milestone Jun 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Estimate average size of objects in etcd and plug it into request cost estimator #132355

Estimate average size of objects in etcd and plug it into request cost estimator #132355

Uh oh!

serathius commented Jun 17, 2025 •

edited

Loading

Uh oh!

serathius commented Jun 17, 2025

Uh oh!

yongruilin commented Jun 17, 2025

Uh oh!

serathius commented Jun 23, 2025

Uh oh!

serathius commented Jun 24, 2025

Uh oh!

wojtek-t left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wojtek-t Jun 25, 2025

Uh oh!

serathius Jun 25, 2025

Uh oh!

serathius Jun 26, 2025 •

edited

Loading

Uh oh!

serathius Jun 26, 2025

Uh oh!

wojtek-t Jun 26, 2025

Uh oh!

wojtek-t commented Jun 26, 2025

Uh oh!

k8s-ci-robot commented Jun 26, 2025

Uh oh!

k8s-ci-robot commented Jun 26, 2025

Uh oh!

Uh oh!

Uh oh!

Estimate average size of objects in etcd and plug it into request cost estimator #132355

Estimate average size of objects in etcd and plug it into request cost estimator #132355

Uh oh!

Conversation

serathius commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serathius commented Jun 17, 2025

Uh oh!

yongruilin commented Jun 17, 2025

Uh oh!

serathius commented Jun 23, 2025

Uh oh!

serathius commented Jun 24, 2025

Uh oh!

wojtek-t left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wojtek-t Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

serathius Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

serathius Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serathius Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

wojtek-t Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

wojtek-t commented Jun 26, 2025

Uh oh!

k8s-ci-robot commented Jun 26, 2025

Uh oh!

k8s-ci-robot commented Jun 26, 2025

Uh oh!

Uh oh!

Uh oh!

serathius commented Jun 17, 2025 •

edited

Loading

serathius Jun 26, 2025 •

edited

Loading