Skip to content

In-memory caching for node image access multitenancy #131882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

stlaz
Copy link
Member

@stlaz stlaz commented May 21, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds a write-through caching layer between the kubelet's image pulls manager and the on-disk image pull records. The cache implements a fallback to the disk on cache miss.

Which issue(s) this PR fixes:

Related-to #kubernetes/enhancements#2535

Special notes for your reviewer:

NOTE: This PR relies, and is rebased, on benchmark code from #131864.

The benchmarks below show the comparison of memory-caching (all records cached, then LRU with 100 records limit) to direct access to image pull records on disk, and when the feature is disabled/enabled.

Raw benchmark results:
directfs_memcache_comparison_28GiB.txt
directfs_memcache_benchmark_28GiB.txt
directfs_memcache_LRU_benchmark_28GiB.txt

  1. Feature enabled/disabled results (ran on a system with 28GiB memory) - columns are the state of the feature
goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/pkg/kubelet/images
cpu: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
                                                                                │  Disabled   │               DirectFS               │              CachedFS               │
                                                                                │   sec/op    │   sec/op     vs base                 │   sec/op     vs base                │
ImagePullManager_CompareEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   283.8µ ± 3%   689.0µ ± 3%  +142.77% (p=0.000 n=10)   408.3µ ± 2%  +43.85% (p=0.000 n=10)

                                                                                │   Disabled   │               DirectFS               │               CachedFS               │
                                                                                │     B/op     │     B/op      vs base                │     B/op      vs base                │
ImagePullManager_CompareEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   39.22Ki ± 1%   61.44Ki ± 1%  +56.63% (p=0.000 n=10)   58.18Ki ± 2%  +48.32% (p=0.000 n=10)

                                                                                │  Disabled  │               DirectFS               │              CachedFS              │
                                                                                │ allocs/op  │  allocs/op   vs base                 │ allocs/op   vs base                │
ImagePullManager_CompareEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   45.00 ± 0%   108.00 ± 0%  +140.00% (p=0.000 n=10)   58.00 ± 0%  +28.89% (p=0.000 n=10)

We can see that caching improves the performance greatly on 100% cache hits compared to accessing the files directly though still being slower than if the feature is disabled. This behavior is expected.

  1. Comparing the direct FS access to all-records-in-memory caching based on cache hit rate and number of records - columns describe the access method
goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/pkg/kubelet/images
cpu: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
                                                                           │   DirectFS   │              CachedFS               │
                                                                           │    sec/op    │   sec/op     vs base                │
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=10-8      1.099m ± 2%   1.125m ± 2%   +2.37% (p=0.011 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=20-8      1.050m ± 3%   1.039m ± 2%        ~ (p=0.165 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=50-8      898.5µ ± 3%   805.2µ ± 4%  -10.39% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=75-8      789.2µ ± 3%   616.9µ ± 4%  -21.83% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=100-8     696.1µ ± 2%   408.1µ ± 2%  -41.38% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=10-8      1.137m ± 1%   1.131m ± 3%   -0.56% (p=0.043 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=20-8      1.089m ± 2%   1.040m ± 2%   -4.50% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=50-8      936.7µ ± 2%   779.5µ ± 2%  -16.79% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=75-8      812.1µ ± 2%   586.9µ ± 4%  -27.73% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=100-8     704.2µ ± 3%   409.0µ ± 2%  -41.92% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=10-8     1.138m ± 1%   1.094m ± 2%   -3.81% (p=0.001 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=20-8     1.058m ± 4%   1.024m ± 3%   -3.17% (p=0.023 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=50-8     931.4µ ± 2%   788.8µ ± 1%  -15.32% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=75-8     820.5µ ± 3%   591.1µ ± 3%  -27.95% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=100-8    684.4µ ± 3%   415.0µ ± 2%  -39.37% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=10-8     1.232m ± 3%   1.163m ± 1%   -5.62% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=20-8     1.175m ± 1%   1.078m ± 2%   -8.21% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=50-8    1001.3µ ± 3%   845.5µ ± 2%  -15.56% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=75-8     847.3µ ± 3%   659.2µ ± 3%  -22.19% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=100-8    722.4µ ± 2%   457.5µ ± 3%  -36.67% (p=0.000 n=10)
geomean                                                                       925.1µ        755.0µ       -18.40%

                                                                           │   DirectFS   │              CachedFS               │
                                                                           │     B/op     │     B/op      vs base               │
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=10-8     68.38Ki ± 2%   68.16Ki ± 2%       ~ (p=0.436 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=20-8     67.65Ki ± 3%   66.54Ki ± 1%  -1.65% (p=0.002 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=50-8     65.57Ki ± 2%   63.60Ki ± 3%  -3.01% (p=0.002 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=75-8     63.65Ki ± 2%   60.53Ki ± 2%  -4.90% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=100-8    61.39Ki ± 2%   57.40Ki ± 1%  -6.50% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=10-8     68.99Ki ± 4%   68.58Ki ± 2%       ~ (p=0.481 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=20-8     67.07Ki ± 2%   67.01Ki ± 2%       ~ (p=0.393 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=50-8     65.80Ki ± 2%   63.50Ki ± 2%  -3.50% (p=0.003 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=75-8     63.29Ki ± 3%   60.06Ki ± 3%  -5.11% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=100-8    61.69Ki ± 2%   57.76Ki ± 2%  -6.37% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=10-8    68.28Ki ± 2%   68.21Ki ± 2%       ~ (p=0.644 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=20-8    68.04Ki ± 2%   66.76Ki ± 3%       ~ (p=0.247 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=50-8    65.75Ki ± 2%   63.69Ki ± 2%  -3.14% (p=0.001 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=75-8    64.90Ki ± 3%   60.47Ki ± 2%  -6.83% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   61.56Ki ± 2%   57.98Ki ± 2%  -5.82% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=10-8    68.77Ki ± 2%   68.62Ki ± 1%       ~ (p=0.481 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=20-8    69.55Ki ± 4%   66.90Ki ± 2%  -3.81% (p=0.005 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=50-8    65.23Ki ± 3%   63.90Ki ± 4%       ~ (p=0.165 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=75-8    63.90Ki ± 2%   60.82Ki ± 3%  -4.81% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=100-8   61.80Ki ± 2%   58.45Ki ± 3%  -5.43% (p=0.000 n=10)
geomean                                                                      65.51Ki        63.33Ki       -3.33%

                                                                           │  DirectFS   │              CachedFS              │
                                                                           │  allocs/op  │ allocs/op   vs base                │
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=10-8      150.0 ± 0%   146.0 ± 1%   -2.67% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=20-8      146.0 ± 1%   135.5 ± 1%   -7.19% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=50-8      131.0 ± 1%   106.0 ± 2%  -19.08% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=75-8     119.50 ± 0%   82.00 ± 1%  -31.38% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=100-8    108.00 ± 0%   58.00 ± 0%  -46.30% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=10-8      150.5 ± 1%   146.0 ± 1%   -2.99% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=20-8      145.0 ± 1%   136.0 ± 1%   -6.21% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=50-8      131.0 ± 1%   106.5 ± 1%  -18.70% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=75-8     119.00 ± 1%   81.00 ± 1%  -31.93% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=100-8    108.00 ± 1%   58.00 ± 0%  -46.30% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=10-8     150.0 ± 1%   146.0 ± 1%   -2.67% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=20-8     146.0 ± 1%   136.5 ± 2%   -6.51% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=50-8     131.0 ± 1%   107.0 ± 2%  -18.32% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=75-8    120.00 ± 1%   82.00 ± 2%  -31.67% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   108.00 ± 0%   58.00 ± 0%  -46.30% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=10-8     150.0 ± 1%   146.0 ± 1%   -2.67% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=20-8     145.0 ± 1%   137.0 ± 1%   -5.52% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=50-8     131.0 ± 1%   107.0 ± 1%  -18.32% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=75-8    119.50 ± 0%   82.00 ± 1%  -31.38% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=100-8   108.00 ± 0%   58.00 ± 0%  -46.30% (p=0.000 n=10)
geomean                                                                       129.9        100.1       -22.91%

As expected, the overhead of the write-through cache shows at low number of records and low cache hit rates, but in general the performance improvement is apparent in scenarios that are more likely in the real world - above 50% hit rate and 50-70 records. The number of allocations also gets much lower, most likely because we don't need to encode/decode the resources too often.

  1. Comparing the direct FS access to in-memory LRU caching with max 100 records based on cache hit rate and number of records - columns describe the access method
goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/pkg/kubelet/images
cpu: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
                                                                           │  DirectFS   │               CachedFS               │
                                                                           │   sec/op    │    sec/op     vs base                │
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=10-8     1.097m ± 2%   1.092m ±  1%        ~ (p=0.631 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=20-8     1.031m ± 2%   1.004m ±  2%   -2.58% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=50-8     890.8µ ± 2%   768.7µ ±  2%  -13.71% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=75-8     780.2µ ± 1%   596.3µ ±  2%  -23.57% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=100-8    684.3µ ± 2%   418.7µ ±  2%  -38.81% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=10-8     1.082m ± 2%   1.108m ±  1%   +2.45% (p=0.001 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=20-8     1.046m ± 1%   1.046m ±  2%        ~ (p=0.912 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=50-8     899.3µ ± 1%   810.2µ ±  1%   -9.91% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=75-8     790.2µ ± 2%   602.9µ ±  3%  -23.70% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=100-8    691.6µ ± 1%   430.0µ ±  1%  -37.83% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=10-8    1.100m ± 3%   1.108m ±  1%        ~ (p=0.353 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=20-8    1.058m ± 1%   1.079m ± 15%        ~ (p=0.280 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=50-8    907.2µ ± 2%   867.6µ ±  2%   -4.36% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=75-8    804.0µ ± 1%   667.0µ ±  1%  -17.04% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   696.2µ ± 2%   434.9µ ±  1%  -37.54% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=10-8    1.127m ± 1%   1.142m ± 13%   +1.32% (p=0.043 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=20-8    1.071m ± 2%   1.086m ±  7%   +1.39% (p=0.023 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=50-8    932.4µ ± 3%   905.3µ ±  2%   -2.90% (p=0.011 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=75-8    811.4µ ± 2%   758.4µ ±  2%   -6.53% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=100-8   707.8µ ± 2%   589.4µ ±  1%  -16.73% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=10-8    1.187m ± 1%   1.208m ±  5%   +1.85% (p=0.002 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=20-8    1.116m ± 2%   1.143m ±  1%   +2.42% (p=0.005 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=50-8    973.5µ ± 1%   967.0µ ±  2%        ~ (p=0.063 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=75-8    859.4µ ± 2%   839.8µ ±  3%   -2.27% (p=0.035 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=100-8   754.5µ ± 2%   704.6µ ±  1%   -6.62% (p=0.000 n=10)
geomean                                                                      910.4µ        815.6µ        -10.41%

                                                                           │   DirectFS   │              CachedFS               │
                                                                           │     B/op     │     B/op      vs base               │
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=10-8     69.31Ki ± 2%   69.30Ki ± 2%       ~ (p=0.853 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=20-8     68.09Ki ± 2%   67.33Ki ± 3%       ~ (p=0.143 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=50-8     66.30Ki ± 2%   63.90Ki ± 2%  -3.62% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=75-8     63.43Ki ± 2%   61.31Ki ± 1%  -3.35% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=100-8    62.06Ki ± 3%   57.18Ki ± 1%  -7.86% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=10-8     68.74Ki ± 2%   69.34Ki ± 2%       ~ (p=0.315 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=20-8     69.02Ki ± 2%   68.55Ki ± 2%       ~ (p=0.280 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=50-8     66.38Ki ± 2%   63.79Ki ± 1%  -3.91% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=75-8     63.43Ki ± 2%   61.52Ki ± 2%  -3.01% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=100-8    61.54Ki ± 3%   58.28Ki ± 2%  -5.29% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=10-8    69.21Ki ± 2%   69.70Ki ± 2%       ~ (p=0.247 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=20-8    68.29Ki ± 2%   68.63Ki ± 2%       ~ (p=0.190 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=50-8    66.64Ki ± 2%   65.38Ki ± 3%  -1.90% (p=0.035 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=75-8    63.61Ki ± 1%   61.93Ki ± 2%  -2.64% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   62.04Ki ± 2%   57.77Ki ± 1%  -6.89% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=10-8    68.31Ki ± 3%   69.31Ki ± 2%       ~ (p=0.089 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=20-8    68.21Ki ± 2%   68.69Ki ± 2%       ~ (p=0.971 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=50-8    66.38Ki ± 3%   65.18Ki ± 4%       ~ (p=0.123 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=75-8    63.75Ki ± 3%   62.72Ki ± 1%  -1.63% (p=0.043 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=100-8   61.56Ki ± 2%   59.84Ki ± 2%  -2.80% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=10-8    69.62Ki ± 2%   69.57Ki ± 2%       ~ (p=1.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=20-8    68.14Ki ± 2%   68.79Ki ± 3%       ~ (p=0.089 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=50-8    65.63Ki ± 1%   65.94Ki ± 2%       ~ (p=0.436 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=75-8    63.57Ki ± 3%   63.81Ki ± 1%       ~ (p=0.912 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=100-8   61.92Ki ± 2%   60.44Ki ± 2%  -2.39% (p=0.000 n=10)
geomean                                                                      65.75Ki        64.60Ki       -1.75%

                                                                           │  DirectFS   │              CachedFS              │
                                                                           │  allocs/op  │ allocs/op   vs base                │
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=10-8      151.0 ± 1%   154.0 ± 1%   +1.99% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=20-8      145.0 ± 1%   142.5 ± 2%   -1.72% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=50-8      131.5 ± 0%   108.5 ± 2%  -17.49% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=75-8     120.00 ± 1%   83.50 ± 2%  -30.42% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=10/CacheHitRate=100-8    108.00 ± 1%   58.00 ± 0%  -46.30% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=10-8      150.0 ± 1%   155.5 ± 1%   +3.67% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=20-8      146.0 ± 1%   148.0 ± 1%   +1.37% (p=0.002 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=50-8      132.0 ± 1%   116.0 ± 1%  -12.12% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=75-8     120.00 ± 1%   84.00 ± 2%  -30.00% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=50/CacheHitRate=100-8    108.00 ± 1%   58.00 ± 0%  -46.30% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=10-8     150.0 ± 0%   156.0 ± 1%   +4.00% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=20-8     145.0 ± 1%   150.0 ± 0%   +3.45% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=50-8     131.0 ± 1%   124.0 ± 2%   -5.34% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=75-8    119.00 ± 1%   96.50 ± 3%  -18.91% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=100/CacheHitRate=100-8   108.00 ± 0%   58.00 ± 0%  -46.30% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=10-8     150.0 ± 1%   157.0 ± 1%   +4.67% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=20-8     145.5 ± 0%   150.0 ± 1%   +3.09% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=50-8     132.0 ± 1%   129.0 ± 2%   -2.27% (p=0.002 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=75-8     119.0 ± 2%   109.0 ± 2%   -8.40% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=200/CacheHitRate=100-8   108.00 ± 1%   84.00 ± 1%  -22.22% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=10-8     151.0 ± 1%   157.0 ± 1%   +3.97% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=20-8     145.0 ± 1%   151.0 ± 1%   +4.14% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=50-8     131.0 ± 1%   133.0 ± 2%   +1.53% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=75-8     120.0 ± 1%   117.0 ± 2%   -2.50% (p=0.000 n=10)
ImagePullManageWithEnsureSecretPulledImages/Records=500/CacheHitRate=100-8    108.0 ± 0%   101.0 ± 1%   -6.48% (p=0.000 n=10)
geomean                                                                       130.0        113.8       -12.47%

The results show that performance gain declines quickly once the number of records kept exceeds the LRU cache capacity. However, the benchmark is currently unable to properly test the LRU strategy assumptions, and the cache hits are generated fairly randomly, which likely will not match the expected use.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/2535

/cc liggitt enj
/sig node
/sig auth

@k8s-ci-robot k8s-ci-robot requested review from enj and liggitt May 21, 2025 11:12
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels May 21, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 21, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label May 21, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: stlaz
Once this PR has been reviewed and has the lgtm label, please assign derekwaynecarr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stlaz stlaz force-pushed the ensure-secret-images-memcache branch from 15d56c1 to a34c079 Compare May 21, 2025 14:04
@enj enj moved this to Needs Triage in SIG Auth May 21, 2025
@aramase aramase moved this from Needs Triage to In Review in SIG Auth Jun 2, 2025
@stlaz
Copy link
Member Author

stlaz commented Jun 4, 2025

/retitle [WIP] In-memory caching for node image access multitenancy
The current implementation loads all the file-based records in memory on init. @benjaminapetersen reminded me that this is perhaps a bit naive implementation and that we may want to limit the number of records in memory. I'll add some kind of an LRU mechanism to the cache.

@k8s-ci-robot k8s-ci-robot changed the title In-memory caching for node image access multitenancy [WIP] In-memory caching for node image access multitenancy Jun 4, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 4, 2025
@bart0sh bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Jun 10, 2025
@stlaz stlaz force-pushed the ensure-secret-images-memcache branch from a34c079 to 8779a03 Compare June 24, 2025 13:15
@k8s-ci-robot
Copy link
Contributor

@stlaz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-unit-windows-master 8779a03 link false /test pull-kubernetes-unit-windows-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@stlaz stlaz changed the title [WIP] In-memory caching for node image access multitenancy In-memory caching for node image access multitenancy Jun 25, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

2 participants