Index pods on namespace labels for KCM controllers to query efficiently from Informer cache. Optimize Endpoint and EndpointSlice controller lock contention #132396

hakuna-matatah · 2025-06-19T05:15:56Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

What ?

This PR introduces a namespace label indexer for PodInformer to efficiently query Pods based on set of namespace labels. This can be use across many KCM controllers to query index store based on label instead of listing everything from store and filtering.

Why ?

TL;DR

This improves scale and performance of controllers and alleviates lock contention by ~12-20X roughly from Endpoint controller and Endpointslice

Details:

We list all pods from store in endpoint controller - xref
We list all pods from store in endpointslice controller - xref

For every add/update/delete pod event belonging to service (and/or) for every add/update/delete service event that gets enqueued to the queue, these controllers will hold lock for every such event enqueued to the Queue while processing, time complexity in the order of - O(#of pods in a namespace) , this can be several 100"s of milliseconds in worst case (including wait time to be processes and acquire lock ) based on the benchmarking i ran couple months when i opened this issue . The impact of this is especially more in kube-system namespace where kube-proxy and other core components live.

Benchmarking Results :


Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkPodNamespaceLabelIndexer$ k8s.io/kubernetes/pkg/controller

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/pkg/controller
cpu: AMD EPYC 7R13 Processor
BenchmarkPodIndexer_SingleLabel/uniform_LabelQueryWithIndex-96         	     435	   2481950 ns/op	 1234367 B/op	      73 allocs/op
BenchmarkPodIndexer_SingleLabel/uniform_LabelQueryFullScan-96         	      30	  38885333 ns/op	 2972792 B/op	      21 allocs/op
BenchmarkPodIndexer_SingleLabel/zipf_LabelQueryWithIndex-96            	     250	   4711724 ns/op	 1518890 B/op	      73 allocs/op
BenchmarkPodIndexer_SingleLabel/zipf_LabelQueryFullScan-96            	      32	  35888043 ns/op	 3398776 B/op	      23 allocs/op
PASS
ok  	k8s.io/kubernetes/pkg/controller	9.195s



15.7X faster in performance. By that much faster it releases the lock.

Which issue(s) this PR is related to:

Fixes #130767

Special notes for your reviewer:

Some PRs i made in the past improving lock contention - #130859 , #130961 , #132305

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2025-06-19T05:16:04Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

hakuna-matatah · 2025-06-19T06:01:20Z

/assign @soltysh @wojtek-t

hakuna-matatah · 2025-06-19T06:02:06Z

/assign @aojea

hakuna-matatah · 2025-06-19T06:14:58Z

cc: @dims @mengqiy @shyamjvs

k8s-ci-robot · 2025-06-19T15:06:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hakuna-matatah
Once this PR has been reviewed and has the lgtm label, please ask for approval from soltysh. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/controller/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dims · 2025-06-19T15:11:57Z

/release-note-none

hakuna-matatah · 2025-06-19T20:22:55Z

/retest

aojea · 2025-06-19T20:23:03Z

pkg/controller/controller_utils.go

+			// For each label in the pod, create an index key in the format of "ns:<namespace>/label:<key>=<value>"
+			indexKeys := make([]string, 0, len(pod.Labels))
+			for k, v := range pod.Labels {
+				indexKeys = append(indexKeys, fmt.Sprintf("ns:%s/label:%s=%s", pod.Namespace, k, v))


curious question, what is the impact on memory consumption?

the benchmark shows a ~ 3x , from 21 alloca/op to 73 alloc/op

hmm - indexing on every single label of any pod is super risky imho and in certain deployments may lead to every substantial number of indexes, which I don't think is the good direction.

yeah, we have the experience of cilium depending on labels to create indexes for network policies and that scales very bad, the fact that labels also mutate during the pod lifecycle makes it problematic too , @hakuna-matatah I do agree with @wojtek-t and better not pursue this path.

/hold

@aojea @wojtek-t really appreciate the input — I agree that unbounded label cardinality or mutation is concerning in terms of memory consumption. I want to clarify that the goal of the benchmark was to show performance impact, not optimize for memory per se.

Let's just walk through memory requirements to better quantify the memory tradeoff:

N = number of pods X = number of unique labels per pod and across all pods Y = number of shared labels per pod

Rough Memory Usage:

Slice entry pointer = 8 Bytes Index key string (namespace + key + value) = 63 B UniqueKeys Memory footprint: (N * X) * (63B Index key + 8B ptr ) SharedKeys Memory footprint: Y * (63B Index key + N * 8B ptr) Total memory approximation ~= UniqueKeys Memory footprint + SharedKeys Memory footprint If we take 150K pods in worst case, and 10 unique labels for each pod and 10 shared labels for each pod. Unique key mem footprint ~= 150k*10(63B + 8B ) = ~= 106 MB Shared key mem footprint ~= 10 * ( 63B + 150k*8B) ~= 12 MB Total memory foot print approximation roughly should be around ~= 128 MB

With slice headers + map overhead, it might add up to ~=200 MB overall, still i feel like 200 MB for 10 shared + 10 unique labels across all 150K pods is ok ?

Given the ~15x performance gain in label-based pod lookups, I believe this could be a decent tradeoff in clusters where labels are controlled . To be safe, I propose:

Guarding the label indexing behind a KCM flag like --enable-pod-label-indexing=false (default false)

Cluster operators can opt in if they know label cardinality is bounded and pod labels are stable

This gives flexibility to performance-sensitive clusters without imposing memory costs on the general case. Let me know if this direction sounds reasonable ???

Like to hear your thoughts by guarding it using KCM flag and leaving it to K8s customer to opt in.

pkg/controller/controller_utils.go

aojea · 2025-06-19T20:50:33Z

sgtm, some comments

/assign @wojtek-t

for the scalability angle and the analysys of the trade off beween cpu and memory, as number of labels are unbounded the number of keys of the new index can grow

pkg/controller/controller_utils.go

pkg/controller/controller_utils_test.go

mengqiy · 2025-06-19T21:36:59Z

pkg/controller/endpoint/endpoints_controller.go

@@ -109,6 +109,10 @@ func NewEndpointController(ctx context.Context, podInformer coreinformers.PodInf
 	e.podLister = podInformer.Lister()
 	e.podsSynced = podInformer.Informer().HasSynced

+	// Initialize the pod indexer
+	controller.AddPodNamespaceLabelIndexer(podInformer.Informer()) //nolint:errcheck


Why not checking error?

We do the same in multiple controller today, following the same.

mengqiy · 2025-06-19T21:37:21Z

pkg/controller/endpointslice/endpointslice_controller.go

@@ -136,6 +136,9 @@ func NewController(ctx context.Context, podInformer coreinformers.PodInformer,
 	})
 	c.podLister = podInformer.Lister()
 	c.podsSynced = podInformer.Informer().HasSynced
+	// Initialize the pod indexer
+	controller.AddPodNamespaceLabelIndexer(podInformer.Informer()) //nolint:errcheck


mengqiy · 2025-06-19T21:43:53Z

pkg/controller/controller_utils_test.go

+					Namespace:   "default",
+					Name:        fmt.Sprintf("pod-%d", i),
+					UID:         types.UID(fmt.Sprintf("uid-%d", i)),
+					Labels:      map[string]string{"app": v},


This benchmark should the perf diff when there's 1 label.
It'd be good to also include 5 labels and 10 labels.

…ly. Also, Optimize Endpoint and Endpoint slice Controller Performance: Reduce Work Duration Time & Minimize Cache LockiEndpoint

wojtek-t · 2025-06-20T12:05:07Z

/hold

hakuna-matatah · 2025-06-20T17:10:42Z

/retest

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jun 19, 2025

k8s-ci-robot requested review from smarterclayton and thockin June 19, 2025 05:16

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 19, 2025

github-project-automation bot added this to SIG Apps Jun 19, 2025

github-project-automation bot moved this to Needs Triage in SIG Apps Jun 19, 2025

hakuna-matatah force-pushed the ep-epslices-index branch from dde68b1 to 5486c1d Compare June 19, 2025 05:33

k8s-ci-robot assigned soltysh and wojtek-t Jun 19, 2025

k8s-ci-robot assigned aojea Jun 19, 2025

hakuna-matatah force-pushed the ep-epslices-index branch from 5486c1d to c4af9f9 Compare June 19, 2025 15:06

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 19, 2025

hakuna-matatah force-pushed the ep-epslices-index branch from c4af9f9 to 1bd6463 Compare June 19, 2025 19:13

aojea reviewed Jun 19, 2025

View reviewed changes

pkg/controller/controller_utils.go Outdated Show resolved Hide resolved

mengqiy reviewed Jun 19, 2025

View reviewed changes

hakuna-matatah force-pushed the ep-epslices-index branch from 1bd6463 to f0dc9f3 Compare June 20, 2025 02:54

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 20, 2025

hakuna-matatah force-pushed the ep-epslices-index branch from f0dc9f3 to 0c14815 Compare June 20, 2025 02:57

Index pods on namespace labels for KCM controllers to query efficient…

8612b61

…ly. Also, Optimize Endpoint and Endpoint slice Controller Performance: Reduce Work Duration Time & Minimize Cache LockiEndpoint

hakuna-matatah force-pushed the ep-epslices-index branch from 0c14815 to 8612b61 Compare June 20, 2025 03:01

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 20, 2025

Index pods on namespace labels for KCM controllers to query efficiently from Informer cache. Optimize Endpoint and EndpointSlice controller lock contention #132396

Are you sure you want to change the base?

Index pods on namespace labels for KCM controllers to query efficiently from Informer cache. Optimize Endpoint and EndpointSlice controller lock contention #132396

Uh oh!

Conversation

hakuna-matatah commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Benchmarking Results :

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Jun 19, 2025

Uh oh!

hakuna-matatah commented Jun 19, 2025

Uh oh!

hakuna-matatah commented Jun 19, 2025

Uh oh!

hakuna-matatah commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jun 19, 2025

Uh oh!

dims commented Jun 19, 2025

Uh oh!

hakuna-matatah commented Jun 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aojea commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wojtek-t commented Jun 20, 2025

Uh oh!

hakuna-matatah commented Jun 20, 2025

Uh oh!

Uh oh!

hakuna-matatah commented Jun 19, 2025 •

edited

Loading

hakuna-matatah commented Jun 19, 2025 •

edited

Loading