Cherry-pick of #120897 #123935 #123887 #123994: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage. #123973

serathius · 2024-03-18T14:23:53Z

Cherry pick of #120897 #123935 #123887 #123994 on release-1.29.

#120897: Ensure that initial events are sorted for WatchList
#123935: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage
#123887: apiserver/storage/cacher: decrease the running time of tests in the cacher package
#123994: Undo double run of the TestWatchSemantics test to avoid hitting timeout

For details on the cherry pick process, see the cherry pick requests page.

kube-apiserver: fixes a 1.27+ regression in watch stability by serving watch requests without a resourceVersion from the watch cache by default, as in <1.27 (disabling the change in #115096 by default). This mitigates the impact of an etcd watch bug (https://github.com/etcd-io/etcd/pull/17555). If the 1.27 change in #115096 to serve these requests from underlying storage is still desired despite the impact on watch stability, it can be re-enabled with a `WatchFromStorageWithoutResourceVersion` feature gate.

liggitt · 2024-03-18T14:31:36Z

/lgtm
/approve

k8s-ci-robot · 2024-03-18T14:31:43Z

LGTM label has been added.

Git tree hash: 4bc6688458dc91c1d378c62f6077f48994df2d31

k8s-ci-robot · 2024-03-18T14:32:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/features/OWNERS~~ [liggitt]
~~staging/src/k8s.io/apiserver/pkg/features/OWNERS~~ [liggitt]
~~staging/src/k8s.io/apiserver/pkg/storage/OWNERS~~ [liggitt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

serathius · 2024-03-18T15:10:54Z

timeout on tests :(
/retest

liggitt · 2024-03-18T15:16:02Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_test.go

+ defer featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.WatchFromStorageWithoutResourceVersion, false)()
+ store, terminate := testSetupWithEtcdAndCreateWrapper(t)
+ t.Cleanup(terminate)
+ storagetesting.RunWatchSemantics(context.TODO(), t, store)


doesn't this test take a long time? does running it twice now exceed package testing time?

Yes, it takes 42s :(

Added 92bdc7b and #123887 to reduce test runtime

need to fix the test in master first ... the doubling of RunWatchSemantics has made this package timeout ~50% of runs since https://github.com/kubernetes/kubernetes/pull/123935/files#r1530224210 merged

I expect that this is related to additional tests present on the main branch. Where I can check the current test runtime?

Sent #123994 and included it into this PR.

Can you keep each commit picked from master distinct, and update the description with all the master PRs now included in this PR?

There are 2 separate commits in the PR, one for #123935 and second for #123994. Updated the PR title and description to mention both PRs. Is this ok?

Added 92bdc7b and #123887 to reduce test runtime

I thought you meant you were pulling in these commits to this PR... was that not the case?

I assumed that just undoing the second run of TestWatchSemantics might get us below timeout. Pulled all the fixes to be safe.

…romStorageWithoutResourceVersion feature gate to allow serving watch from storage.

…acher package. It turns out that kube has a custom timeout for tests of 3 minutes. The tests in the cacher package are utilizing nearly the entire time and are being terminated, resulting in failing jobs. Before the change, the TestWatchSemantics took ~43s to run. With this simple change, it now takes ~18s. When we created the tests, we didn't measure the running time and assumed that waiting 1 second on a watch channel to make sure no more events are received was sufficient. This PR decreases the waiting time to 300 milliseconds. Modern computers can perform many tasks within that time. In addition to that, the tests are serial in nature, meaning that there is no other actor that could add items to the database, which could result in receiving new items. After the change the total running time decreased by 17%. Before the tests needed ~176s after they need ~146s. The changes also improved TestWatchSemanticInitialEventsExtended.

Jefftree · 2024-03-19T18:51:41Z

/triage accepted

serathius · 2024-03-19T20:50:43Z

/king bug

serathius · 2024-03-19T22:53:17Z

/retest

serathius · 2024-03-20T07:15:14Z

/retest

liggitt · 2024-03-20T13:39:16Z

/lgtm
cc @kubernetes/release-managers

@serathius can you propagate this to 1.28 and 1.27 now that CI is green?

k8s-ci-robot · 2024-03-20T13:39:23Z

LGTM label has been added.

Git tree hash: 50e77ffc62a59e052defbd75fe7d7fd9da8d959e

serathius · 2024-03-20T14:13:27Z

Done, #124006 #124007

jeremyrickard · 2024-03-20T21:23:15Z

/cherry-pick-approved

k8s-ci-robot added the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Mar 18, 2024

k8s-ci-robot added this to the v1.29 milestone Mar 18, 2024

k8s-ci-robot requested review from dims and mikedanese March 18, 2024 14:24

k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 18, 2024

k8s-ci-robot assigned liggitt Mar 18, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 18, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 18, 2024

liggitt reviewed Mar 18, 2024

View reviewed changes

wojtek-t and others added 3 commits March 19, 2024 08:52

Ensure that initial events are sorted for WatchList

ff2189b

Serve watch without resourceVersion from cache and introduce a WatchF…

f8f0854

…romStorageWithoutResourceVersion feature gate to allow serving watch from storage.

serathius force-pushed the consistent-watch-from-etcd-1.29 branch from d8ae9f8 to d9ca300 Compare March 19, 2024 08:07

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 19, 2024

k8s-ci-robot requested a review from liggitt March 19, 2024 08:07

serathius mentioned this pull request Mar 19, 2024

Undo double run of the TestWatchSemantics test to avoid hitting timeout #123994

Merged

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 19, 2024

k8s-ci-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Mar 19, 2024

Undo double run of the TestWatchSemantics test to avoid hitting timeout

cf2a337

serathius force-pushed the consistent-watch-from-etcd-1.29 branch from 54bbd36 to cf2a337 Compare March 19, 2024 22:17

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 19, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 20, 2024

liggitt added kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 20, 2024

liggitt added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 20, 2024

k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Mar 20, 2024

jeremyrickard added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Mar 20, 2024

k8s-ci-robot removed the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Mar 20, 2024

k8s-ci-robot merged commit 6a9602d into kubernetes:release-1.29 Mar 20, 2024
16 checks passed

tooptoop4 mentioned this pull request Jun 2, 2024

Cluster syncs hang (syncs never complete) argoproj/argo-cd#18467

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick of #120897 #123935 #123887 #123994: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage. #123973

Cherry-pick of #120897 #123935 #123887 #123994: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage. #123973

serathius commented Mar 18, 2024 •

edited by liggitt

liggitt commented Mar 18, 2024

k8s-ci-robot commented Mar 18, 2024

k8s-ci-robot commented Mar 18, 2024

serathius commented Mar 18, 2024

liggitt Mar 18, 2024

serathius Mar 19, 2024

serathius Mar 19, 2024

liggitt Mar 19, 2024

serathius Mar 19, 2024

serathius Mar 19, 2024

liggitt Mar 19, 2024

serathius Mar 19, 2024

liggitt Mar 19, 2024

serathius Mar 19, 2024

Jefftree commented Mar 19, 2024

serathius commented Mar 19, 2024

serathius commented Mar 19, 2024

serathius commented Mar 20, 2024

liggitt commented Mar 20, 2024

k8s-ci-robot commented Mar 20, 2024

serathius commented Mar 20, 2024

jeremyrickard commented Mar 20, 2024

Cherry-pick of #120897 #123935 #123887 #123994: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage. #123973

Cherry-pick of #120897 #123935 #123887 #123994: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage. #123973

Conversation

serathius commented Mar 18, 2024 • edited by liggitt

liggitt commented Mar 18, 2024

k8s-ci-robot commented Mar 18, 2024

k8s-ci-robot commented Mar 18, 2024

serathius commented Mar 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jefftree commented Mar 19, 2024

serathius commented Mar 19, 2024

serathius commented Mar 19, 2024

serathius commented Mar 20, 2024

liggitt commented Mar 20, 2024

k8s-ci-robot commented Mar 20, 2024

serathius commented Mar 20, 2024

jeremyrickard commented Mar 20, 2024

serathius commented Mar 18, 2024 •

edited by liggitt