-
Notifications
You must be signed in to change notification settings - Fork 40.9k
Estimate average size of objects in etcd and plug it into request cost estimator #132355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9771f1c
to
f4ea6bd
Compare
/retest |
/triage accepted |
f4ea6bd
to
88bd225
Compare
88bd225
to
018bae7
Compare
de52fb4
to
c0dc3e1
Compare
4bfdf01
to
bdf3707
Compare
bdf3707
to
e9173b8
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments - but overall this lgtm
@@ -792,6 +814,9 @@ func (s *store) GetList(ctx context.Context, key string, opts storage.ListOption | |||
|
|||
// being unable to set the version does not prevent the object from being extracted | |||
if matched, err := opts.Predicate.Matches(obj); err == nil && matched { | |||
if s.stats != nil { | |||
s.stats.AddOrUpdate(string(kv.Key), kv.ModRevision, int64(len(kv.Value))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started wondering about the impact of this change on potential list latency (as well as the corresponding watch change on the watch throughput).
Recording the change in cache is fast, but if we want to acquire and release a lock for every single change here - do we have any benchmarks showing the impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote the updates to stats to handle all KVs at once instead of one by one to reduce locking.
Started doing benchmarks for List, preliminary results show 3-7% performance overhead. Pending estimation of watch overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rewrite it looks like the overhead is at max 5% when we paginate with 150`000 objects. As benchmark uses minimal object size this is worst case scenario.
$ benchstat -col /SizeBasedListCostEstimate bench5.txt
│ true │ false │
│ sec/op │ sec/op vs base │
StoreList/Namespaces=10000/Pods=150000/Nodes=5000/RV=/Scope=Cluster/Paginate=0-24 212.3m ± 6% 204.8m ± 2% -3.56% (p=0.001 n=10)
StoreList/Namespaces=10000/Pods=150000/Nodes=5000/RV=/Scope=Cluster/Paginate=1000-24 292.3m ± 2% 306.9m ± 1% +5.02% (p=0.000 n=10)
StoreList/Namespaces=10000/Pods=150000/Nodes=5000/RV=/Scope=Node/Paginate=0-24 204.1m ± 1% 203.9m ± 3% ~ (p=0.912 n=10)
StoreList/Namespaces=10000/Pods=150000/Nodes=5000/RV=/Scope=Namespace/Paginate=0-24 81.17µ ± 2% 83.82µ ± 1% +3.27% (p=0.000 n=10)
StoreList/Namespaces=50/Pods=150000/Nodes=5000/RV=/Scope=Cluster/Paginate=0-24 206.9m ± 6% 208.5m ± 1% ~ (p=0.579 n=10)
StoreList/Namespaces=50/Pods=150000/Nodes=5000/RV=/Scope=Cluster/Paginate=1000-24 320.3m ± 1% 325.7m ± 1% +1.69% (p=0.004 n=10)
StoreList/Namespaces=50/Pods=150000/Nodes=5000/RV=/Scope=Node/Paginate=0-24 205.6m ± 3% 207.3m ± 5% ~ (p=0.684 n=10)
StoreList/Namespaces=50/Pods=150000/Nodes=5000/RV=/Scope=Namespace/Paginate=0-24 4.034m ± 1% 4.030m ± 1% ~ (p=0.315 n=10)
StoreList/Namespaces=50/Pods=150000/Nodes=5000/RV=/Scope=Namespace/Paginate=1000-24 3.752m ± 1% 3.703m ± 1% -1.28% (p=0.002 n=10)
StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Cluster/Paginate=0-24 148.4m ± 2% 144.7m ± 3% -2.43% (p=0.019 n=10)
StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Cluster/Paginate=1000-24 203.1m ± 1% 201.7m ± 1% ~ (p=0.063 n=10)
StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Node/Paginate=0-24 147.6m ± 2% 145.1m ± 3% ~ (p=0.143 n=10)
StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Node/Paginate=100-24 15.80 ± 1% 15.61 ± 1% -1.16% (p=0.011 n=10)
StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Namespace/Paginate=0-24 1.434m ± 1% 1.419m ± 1% -1.02% (p=0.043 n=10)
StoreList/Namespaces=100/Pods=110000/Nodes=1000/RV=/Scope=Namespace/Paginate=1000-24 1.430m ± 1% 1.418m ± 1% ~ (p=0.089 n=10)
geomean 50.00m 49.95m -0.10%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the BenchmarkStatsCacheCleanKeys
to observe the potential negative impact on watch latency during cleanup.
BenchmarkStatsCacheCleanKeys-24 55 21719938 ns/op 10496345 B/op 515 allocs/op
BenchmarkStatsCacheCleanKeys-24 57 28318478 ns/op 10496350 B/op 515 allocs/op
BenchmarkStatsCacheCleanKeys-24 62 22347929 ns/op 10496460 B/op 516 allocs/op
It takes 2-3ms to run cleanup for 150'000 keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I think that's acceptable thanks!
34edc05
to
67b52d7
Compare
67b52d7
to
1d6409a
Compare
1d6409a
to
1639b09
Compare
/lgtm |
LGTM label has been added. Git tree hash: 3a0607c1e6a4b1231533052ebe8be461a89ce067
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: serathius, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/kind feature
Ref #132233
Alternative to #132292
This PR proposes to track object size in resource by caching last observed state for each key and then calculating average size of keys available in etcd. Fetching the list of keys from etcd is required to make sure we remove stale keys that were not deleted due skipped watch events. We could take list of keys from cacher to avoid
WithKeysOnly
request./assign @deads2k @wojtek-t