Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick of #120897 #123935 #123887 #123994: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage. #123973

Merged

Conversation

serathius
Copy link
Contributor

@serathius serathius commented Mar 18, 2024

Cherry pick of #120897 #123935 #123887 #123994 on release-1.29.

#120897: Ensure that initial events are sorted for WatchList
#123935: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage
#123887: apiserver/storage/cacher: decrease the running time of tests in the cacher package
#123994: Undo double run of the TestWatchSemantics test to avoid hitting timeout

For details on the cherry pick process, see the cherry pick requests page.

kube-apiserver: fixes a 1.27+ regression in watch stability by serving watch requests without a resourceVersion from the watch cache by default, as in <1.27 (disabling the change in #115096 by default). This mitigates the impact of an etcd watch bug (https://github.com/etcd-io/etcd/pull/17555). If the 1.27 change in #115096 to serve these requests from underlying storage is still desired despite the impact on watch stability, it can be re-enabled with a `WatchFromStorageWithoutResourceVersion` feature gate.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Mar 18, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Mar 18, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 18, 2024
@k8s-ci-robot k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 18, 2024
@liggitt
Copy link
Member

liggitt commented Mar 18, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 18, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 4bc6688458dc91c1d378c62f6077f48994df2d31

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 18, 2024
@serathius
Copy link
Contributor Author

timeout on tests :(
/retest

defer featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.WatchFromStorageWithoutResourceVersion, false)()
store, terminate := testSetupWithEtcdAndCreateWrapper(t)
t.Cleanup(terminate)
storagetesting.RunWatchSemantics(context.TODO(), t, store)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this test take a long time? does running it twice now exceed package testing time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it takes 42s :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 92bdc7b and #123887 to reduce test runtime

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to fix the test in master first ... the doubling of RunWatchSemantics has made this package timeout ~50% of runs since https://github.com/kubernetes/kubernetes/pull/123935/files#r1530224210 merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that this is related to additional tests present on the main branch. Where I can check the current test runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent #123994 and included it into this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep each commit picked from master distinct, and update the description with all the master PRs now included in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 separate commits in the PR, one for #123935 and second for #123994. Updated the PR title and description to mention both PRs. Is this ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 92bdc7b and #123887 to reduce test runtime

I thought you meant you were pulling in these commits to this PR... was that not the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that just undoing the second run of TestWatchSemantics might get us below timeout. Pulled all the fixes to be safe.

wojtek-t and others added 3 commits March 19, 2024 08:52
…romStorageWithoutResourceVersion feature gate to allow serving watch from storage.
…acher package.

It turns out that kube has a custom timeout for tests of 3 minutes.
The tests in the cacher package are utilizing nearly the
entire time and are being terminated, resulting in failing jobs.

Before the change, the TestWatchSemantics took ~43s to run. With this simple change, it now takes ~18s.

When we created the tests, we didn't measure the running time and assumed that waiting 1 second on a watch channel
to make sure no more events are received was sufficient.
This PR decreases the waiting time to 300 milliseconds.
Modern computers can perform many tasks within that time.
In addition to that, the tests are serial in nature, meaning that there is no other
actor that could add items to the database, which could result in receiving new items.

After the change the total running time decreased by 17%.
Before the tests needed ~176s after they need ~146s.
The changes also improved TestWatchSemanticInitialEventsExtended.
@serathius serathius force-pushed the consistent-watch-from-etcd-1.29 branch from d8ae9f8 to d9ca300 Compare March 19, 2024 08:07
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 19, 2024
@Jefftree
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 19, 2024
@serathius serathius changed the title Cherry-pick of #123935: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage. Cherry-pick of #123935 #123994: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage. Mar 19, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Mar 19, 2024
@serathius
Copy link
Contributor Author

/king bug

@serathius serathius force-pushed the consistent-watch-from-etcd-1.29 branch from 54bbd36 to cf2a337 Compare March 19, 2024 22:17
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 19, 2024
@serathius serathius changed the title Cherry-pick of #123935 #123994: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage. Cherry-pick of #120897 #123935 #123887 #123994: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage. Mar 19, 2024
@serathius
Copy link
Contributor Author

/retest

1 similar comment
@serathius
Copy link
Contributor Author

/retest

@liggitt
Copy link
Member

liggitt commented Mar 20, 2024

/lgtm
cc @kubernetes/release-managers

@serathius can you propagate this to 1.28 and 1.27 now that CI is green?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 20, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 50e77ffc62a59e052defbd75fe7d7fd9da8d959e

@serathius
Copy link
Contributor Author

Done, #124006 #124007

@liggitt liggitt added kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 20, 2024
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 20, 2024
@liggitt liggitt added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 20, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Mar 20, 2024
@jeremyrickard
Copy link
Contributor

/cherry-pick-approved

@jeremyrickard jeremyrickard added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Mar 20, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Mar 20, 2024
@k8s-ci-robot k8s-ci-robot merged commit 6a9602d into kubernetes:release-1.29 Mar 20, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

7 participants