client-go/reflector: stop exposing UseWatchList #132453

p0lyn0mial · 2025-06-23T08:37:23Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2025-06-23T08:37:32Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

p0lyn0mial · 2025-06-23T08:37:34Z

/assign @wojtek-t

wojtek-t · 2025-06-23T09:43:59Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go

@@ -437,9 +436,6 @@ func NewCacherFromConfig(config Config) (*Cacher, error) {
 	// We don't want to terminate all watchers as recreating all watchers puts high load on api-server.
 	// In most of the cases, leader is reelected within few cycles.
 	reflector.MaxInternalErrorRetryDuration = time.Second * 30
-	// since the watch-list is provided by the watch cache instruct
-	// the reflector to issue a regular LIST against the store
-	reflector.UseWatchList = ptr.To(false)


Thinking through potential concerns:

"go back in time" - this doesn't change, because without list-watch, we were using "rv=0" on startup anyway, right?

does list-watch even work with etcd? etcd won't send you the bookmark, so we don't even know when the "list" part i finished. how is that solved?

does list-watch even work with etcd? etcd won't send you the bookmark, so we don't even know when the "list" part i finished. how is that solved?

yes, we added support for streaming directly to the etcd storage layer - #119557

"go back in time" - this doesn't change, because without list-watch, we were using "rv=0" on startup anyway, right?

I think this translates to Quorum list + watch stream both on startup and on resumption, because of RV=0, RVM=NotOlderThan, SendInitialEvents= true

yes, we added support for streaming directly to the etcd storage layer - #119557

OK - great, forgot about it.

I think this translates to Quorum list + watch stream both on startup and on resumption, because of RV=0, RVM=NotOlderThan, SendInitialEvents= true

RV=0, RVM=NotOlderThan is not a Quorum list - it's "given me anything not older than 0 - so literally anything.

RV="" is what we need here

sorry, since the cacher doesn't set the RV it actually will be RV="", which also translates to Quorum list + watch stream

It's not handled by cacher - this part is explicitly handled by reflector.

But for posterity, we seem to be good here:

With regular list (the old way):

the initial list is using relistResourceVersion():

kubernetes/staging/src/k8s.io/client-go/tools/cache/reflector.go

Line 580 in 6eff9db

options := metav1.ListOptions{ResourceVersion: r.relistResourceVersion()}

it's setting RV="0" initially and later reusing the last seen RV:

kubernetes/staging/src/k8s.io/client-go/tools/cache/reflector.go

Lines 998 to 1003 in 6eff9db

if r.lastSyncResourceVersion == "" {

// For performance reasons, initial list performed by reflector uses "0" as resource version to allow it to

// be served from the watch cache if it is enabled.

return "0"

}

return r.lastSyncResourceVersion

With listwatch (the new way)

we're using rewatchResourceVersion():

kubernetes/staging/src/k8s.io/client-go/tools/cache/reflector.go

Lines 739 to 751 in 6eff9db

lastKnownRV := r.rewatchResourceVersion()

temporaryStore = NewStore(DeletionHandlingMetaNamespaceKeyFunc)

// TODO(#115478): large "list", slow clients, slow network, p&f

// might slow down streaming and eventually fail.

// maybe in such a case we should retry with an increased timeout?

timeoutSeconds := int64(r.minWatchTimeout.Seconds() * (rand.Float64() + 1.0))

options := metav1.ListOptions{

ResourceVersion: lastKnownRV,

AllowWatchBookmarks: true,

SendInitialEvents: pointer.Bool(true),

ResourceVersionMatch: metav1.ResourceVersionMatchNotOlderThan,

TimeoutSeconds: &timeoutSeconds,

}

this is using lastSyncResourceVersion:

kubernetes/staging/src/k8s.io/client-go/tools/cache/reflector.go

Line 1015 in 6eff9db

return r.lastSyncResourceVersion

which is set to RV="" initially and later reusing the last seen RV

So we seem to be good here.

It's not handled by cacher - this part is explicitly handled by reflector.

@wojtek-t the cacher provides a ListWatcher which is used by the reflector. The ListWatcher ignores the RV passed from the reflector, here: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/cacher/lister_watcher.go#L61

But if we use "watchlist" feature, the List call will not be used at all, because reflector will call watch:

kubernetes/staging/src/k8s.io/client-go/tools/cache/reflector.go

Line 754 in 6eff9db

w, err = r.listerWatcher.WatchWithContext(ctx, options)

I thought you were worried about a failure mode where the new mode fails and we fallback to the standard list.

I think we might have the following cases:

RV = “” for list translates to quorum read RV = 0 for list translates to quorum read RV = “”, for watchlist also translates to quorum read. RV = 0 for watchlist translates to quorum read RV > 0 for watchlist translates to quorum read + check RV <= etcdRV On startup and fallback we use list+watch with RV = “” which translates to quorum read. On resumption and fallback we use list+watch with RV = “” which translates to quorum read. On expiration and fallback we use list+watch with RV = “” which translates to quorum read. On startup for watchlist we use RV = 0 which translates to quorum read. On resumption for watchlist we use RV > 0 which translates to quorum read. On expiration for watchlist we use RV=“” which translates to quorum read.

wojtek-t · 2025-06-23T12:45:54Z

The test failures seem real - please fix

wojtek-t · 2025-06-24T06:09:21Z

test/integration/apimachinery/watchlist_test.go

 )

 func TestReflectorWatchListFallback(t *testing.T) {
+	t.Skipf("test")


Please fix - we don't want to commit it.

first we need #132479

wojtek-t · 2025-06-24T11:59:26Z

test/integration/apimachinery/watchlist_test.go

 	target := cache.NewReflector(lw, &v1.Secret{}, store, time.Duration(0))
-	target.UseWatchList = ptr.To(true)
+	clientfeaturestesting.SetFeatureDuringTest(t, clientfeatures.WatchListClient, false)


This is somewhat weird to me - what are we trying to achieve here?
Also - isn't it racy by definition?

setting the FG also affects the server (the second server, the informers used by the second server), wanted to enable the FG only for the informer.

OK - and this is relying on the fact that we're instantiating it as part of reflector creation here:
https://github.com/kubernetes/kubernetes/pull/132453/files#diff-9ccdf713e010f73dbebd01e936cb0077fc63e4f5ab941d865ded42da219d84ecR293

Let's maybe try to structure that cleaner and add some comment like:

var target *Reflector func() { // Enable ListWatchClient only for this reflector. // We rely on the fact that instantiation of whether watchlist is used or not is done once // during reflector creation in NewReflectorWithOptions clientfeaturestesting.SetFeatureDuringTest(t, clientfeatures.WatchListClient, true) defer clientfeaturestesting.SetFeatureDuringTest(t, clientfeatures.WatchListClient, false) target = cache.NewReflector(lw, &v1.Secret{}, store, time.Duration(0)) }()

mhm, i am not sure why wrapping into a function is cleaner (?). Setting a FG during a test is serial, there is not race.

OK - and this is relying on the fact that we're instantiating it as part of reflector creation here:
https://github.com/kubernetes/kubernetes/pull/132453/files#diff-9ccdf713e010f73dbebd01e936cb0077fc63e4f5ab941d865ded42da219d84ecR293

yes, we can add a comment to clarify.

p0lyn0mial · 2025-06-24T12:18:08Z

pull-kubernetes-integration failed on TestEtcdStoragePath

I think we need one more fix.

p0lyn0mial · 2025-06-24T13:23:25Z

pull-kubernetes-integration failed on TestEtcdStoragePath

I think we need one more fix.

#132497

k8s-ci-robot · 2025-06-25T08:19:58Z

@p0lyn0mial: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-apidiff-client-go	`4437a62`	link	false	`/test pull-kubernetes-apidiff-client-go`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

wojtek-t · 2025-06-25T08:29:11Z

/lgtm
/approve

k8s-ci-robot · 2025-06-25T08:29:18Z

LGTM label has been added.

Git tree hash: 75f3d65910a221c0eece1318601d050c36c43397

k8s-ci-robot · 2025-06-25T08:29:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/storage/OWNERS~~ [wojtek-t]
~~staging/src/k8s.io/client-go/tools/cache/OWNERS~~ [wojtek-t]
~~test/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jun 23, 2025

k8s-ci-robot assigned wojtek-t Jun 23, 2025

k8s-ci-robot requested review from deads2k and smarterclayton June 23, 2025 08:37

k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 23, 2025

wojtek-t reviewed Jun 23, 2025

View reviewed changes

k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 23, 2025

wojtek-t reviewed Jun 24, 2025

View reviewed changes

p0lyn0mial force-pushed the upstream-watchlist-reflector-rm-useWatchList branch from adccf59 to f1b991a Compare June 24, 2025 10:57

wojtek-t reviewed Jun 24, 2025

View reviewed changes

p0lyn0mial added 3 commits June 25, 2025 10:13

client-go/reflector: stop exposing UseWatchList

9f0711e

apiserver/cacher: stop setting reflector.UseWatchList

fac2347

test/integration/watchlist: fix TestReflectorWatchListFallback

4437a62

p0lyn0mial force-pushed the upstream-watchlist-reflector-rm-useWatchList branch from f1b991a to 4437a62 Compare June 25, 2025 08:16

wojtek-t added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 25, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 25, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2025

wojtek-t mentioned this pull request Jun 25, 2025

[client-go #1415] Use transformer from provided store within internal stores in reflector to limit memory usage bursts #131799

Merged

k8s-ci-robot merged commit b8b3984 into kubernetes:master Jun 25, 2025
13 of 14 checks passed

k8s-ci-robot added this to the v1.34 milestone Jun 25, 2025

	if r.lastSyncResourceVersion == "" {
	// For performance reasons, initial list performed by reflector uses "0" as resource version to allow it to
	// be served from the watch cache if it is enabled.
	return "0"
	}
	return r.lastSyncResourceVersion

	lastKnownRV := r.rewatchResourceVersion()
	temporaryStore = NewStore(DeletionHandlingMetaNamespaceKeyFunc)
	// TODO(#115478): large "list", slow clients, slow network, p&f
	// might slow down streaming and eventually fail.
	// maybe in such a case we should retry with an increased timeout?
	timeoutSeconds := int64(r.minWatchTimeout.Seconds() * (rand.Float64() + 1.0))
	options := metav1.ListOptions{
	ResourceVersion: lastKnownRV,
	AllowWatchBookmarks: true,
	SendInitialEvents: pointer.Bool(true),
	ResourceVersionMatch: metav1.ResourceVersionMatchNotOlderThan,
	TimeoutSeconds: &timeoutSeconds,
	}

client-go/reflector: stop exposing UseWatchList #132453

client-go/reflector: stop exposing UseWatchList #132453

Uh oh!

Conversation

p0lyn0mial commented Jun 23, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Jun 23, 2025

Uh oh!

p0lyn0mial commented Jun 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

p0lyn0mial Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wojtek-t commented Jun 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

p0lyn0mial commented Jun 24, 2025

Uh oh!

p0lyn0mial commented Jun 24, 2025

Uh oh!

k8s-ci-robot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wojtek-t commented Jun 25, 2025

Uh oh!

k8s-ci-robot commented Jun 25, 2025

Uh oh!

k8s-ci-robot commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

p0lyn0mial Jun 23, 2025 •

edited

Loading

k8s-ci-robot commented Jun 25, 2025 •

edited

Loading