Skip to content

Benchmarks for node image access multitenancy #131864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stlaz
Copy link
Member

@stlaz stlaz commented May 20, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds benchmarking for the Ensure Secret Pulled Images feature

Which issue(s) this PR fixes:

Related to kubernetes/enhancements#2535

Special notes for your reviewer:

There are two benchmarks - one comparing the featuregate being enabled and disabled, one that is parametrized based on cache hit rate and number of records.

Raw benchmark results:
directfs_comparison_28GiB.txt
directfs_parametrized_28GiB.txt

  1. Feature enabled/disabled results (ran on a system with 28GiB memory) - columns are the state of the feature - disabled or with accessing the records on the FS
goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/pkg/kubelet/images
cpu: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
                                                                                │  Disabled   │               DirectFS               │
                                                                                │   sec/op    │   sec/op     vs base                 │
ImagePullManager_CompareEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   288.8µ ± 5%   680.0µ ± 1%  +135.43% (p=0.000 n=10)

                                                                                │   Disabled   │               DirectFS               │
                                                                                │     B/op     │     B/op      vs base                │
ImagePullManager_CompareEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   39.08Ki ± 2%   61.92Ki ± 2%  +58.44% (p=0.000 n=10)

                                                                                │  Disabled  │               DirectFS               │
                                                                                │ allocs/op  │  allocs/op   vs base                 │
ImagePullManager_CompareEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   45.00 ± 0%   108.00 ± 0%  +140.00% (p=0.000 n=10)

We can see that there is a performance and memory allocation hit. The original code with the feature disabled is more or less linear, whereas enabling the feature adds a couple loops and IO operations, and so some performance hit is to be expected. The real world impact should be rather minimal, though.

  1. Results of parametrizing the direct FS access - columns are the number of records in cache
goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/pkg/kubelet/images
cpu: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
                                                                                           │     10      │                 50                 │                100                 │                 500                  │
                                                                                           │   sec/op    │   sec/op     vs base               │   sec/op     vs base               │    sec/op     vs base                │
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=10-8    1.143m ± 2%   1.140m ± 2%       ~ (p=0.739 n=10)   1.151m ± 1%       ~ (p=0.105 n=10)    1.217m ± 1%   +6.47% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=20-8    1.074m ± 2%   1.100m ± 2%  +2.44% (p=0.002 n=10)   1.097m ± 1%  +2.16% (p=0.005 n=10)    1.168m ± 2%   +8.72% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=50-8    929.0µ ± 2%   952.3µ ± 2%  +2.51% (p=0.006 n=10)   942.4µ ± 1%  +1.43% (p=0.029 n=10)   1010.9µ ± 2%   +8.81% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=75-8    804.9µ ± 3%   823.2µ ± 2%       ~ (p=0.052 n=10)   825.7µ ± 2%  +2.58% (p=0.015 n=10)    880.3µ ± 2%   +9.37% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=100-8   678.2µ ± 2%   696.1µ ± 2%  +2.65% (p=0.002 n=10)   703.9µ ± 2%  +3.79% (p=0.000 n=10)    750.1µ ± 1%  +10.61% (p=0.000 n=10)
geomean                                                                                      909.6µ        927.1µ       +1.92%                  929.0µ       +2.13%                   989.6µ        +8.79%

                                                                                           │      10      │                 50                  │                 100                 │                 500                 │
                                                                                           │     B/op     │     B/op      vs base               │     B/op      vs base               │     B/op      vs base               │
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=10-8    68.46Ki ± 2%   68.16Ki ± 2%       ~ (p=0.684 n=10)   69.15Ki ± 3%       ~ (p=0.436 n=10)   68.84Ki ± 2%       ~ (p=0.739 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=20-8    68.21Ki ± 3%   67.78Ki ± 3%       ~ (p=0.684 n=10)   67.62Ki ± 2%       ~ (p=0.436 n=10)   67.79Ki ± 3%       ~ (p=0.353 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=50-8    66.26Ki ± 2%   65.86Ki ± 2%       ~ (p=0.218 n=10)   66.42Ki ± 2%       ~ (p=0.971 n=10)   66.28Ki ± 3%       ~ (p=0.684 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=75-8    63.99Ki ± 2%   64.16Ki ± 2%       ~ (p=0.631 n=10)   64.06Ki ± 2%       ~ (p=1.000 n=10)   63.97Ki ± 1%       ~ (p=0.739 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=100-8   62.12Ki ± 1%   62.25Ki ± 2%       ~ (p=0.971 n=10)   61.69Ki ± 2%       ~ (p=0.280 n=10)   62.09Ki ± 1%       ~ (p=0.579 n=10)
geomean                                                                                      65.76Ki        65.60Ki       -0.24%                  65.74Ki       -0.04%                  65.75Ki       -0.02%

                                                                                           │     10     │                50                 │                100                │                500                │
                                                                                           │ allocs/op  │ allocs/op   vs base               │ allocs/op   vs base               │ allocs/op   vs base               │
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=10-8    150.0 ± 1%   150.0 ± 1%       ~ (p=0.973 n=10)   150.0 ± 1%       ~ (p=0.677 n=10)   150.0 ± 0%       ~ (p=0.495 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=20-8    145.0 ± 1%   146.0 ± 1%       ~ (p=0.120 n=10)   145.0 ± 1%       ~ (p=0.978 n=10)   145.0 ± 1%       ~ (p=0.861 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=50-8    132.0 ± 1%   131.0 ± 1%       ~ (p=0.174 n=10)   131.5 ± 0%       ~ (p=0.277 n=10)   131.0 ± 2%       ~ (p=0.438 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=75-8    120.0 ± 1%   120.0 ± 1%       ~ (p=1.000 n=10)   120.0 ± 1%       ~ (p=1.000 n=10)   119.5 ± 0%       ~ (p=0.577 n=10)
ImagePullManageWithEnsureSecretPulledImages/ImageRecordsAccess=DirectFS/CacheHitRate=100-8   108.0 ± 1%   108.0 ± 1%       ~ (p=0.628 n=10)   108.0 ± 0%       ~ (p=0.474 n=10)   108.0 ± 1%       ~ (p=1.000 n=10)
geomean                                                                                      130.1        130.0       -0.01%                  130.0       -0.08%                  129.7       -0.24%

The cache hit rate for apparent reasons has impact on the performance, and so does the number of records. As presented above, when the number of records increases from 10 to 500, the impact on s/op is around +9.5% at cache hit rate 75%, which seems acceptable.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/2535

/cc @liggitt @enj

@k8s-ci-robot k8s-ci-robot requested review from enj and liggitt May 20, 2025 12:48
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 20, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 20, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 20, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: stlaz
Once this PR has been reviewed and has the lgtm label, please assign derekwaynecarr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stlaz stlaz force-pushed the ensure-secret-images-benchmark branch from ee243f9 to 02f9df8 Compare May 20, 2025 14:22
@stlaz
Copy link
Member Author

stlaz commented May 20, 2025

/test pull-kubernetes-unit-windows-master
looks like an infra problem

@stlaz stlaz force-pushed the ensure-secret-images-benchmark branch from 02f9df8 to 4699986 Compare May 21, 2025 08:46
@stlaz
Copy link
Member Author

stlaz commented May 21, 2025

/sig auth

@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label May 21, 2025
@enj enj moved this to Needs Triage in SIG Auth May 21, 2025
@aramase aramase moved this from Needs Triage to In Progress in SIG Auth Jun 2, 2025
@stlaz stlaz force-pushed the ensure-secret-images-benchmark branch from 4699986 to cfa5280 Compare June 24, 2025 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants