Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude nodes from rolling update depending on tolerations #119317

Merged

Conversation

mochizuki875
Copy link
Member

@mochizuki875 mochizuki875 commented Jul 14, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

In this PR, exclude Nodes from daemon pod updating if the tolerations of it dose not match the taints of Nodes.
It will cause update stuck if we try to rolling update DaemonSet which has non zero maxSurge and dose not have tolerations matching Nodes taints.
It's because all Nodes running Pods belonging to the previous DaemonSet are included in the update, regardless of the tolerations of the new DaemonSet.

Which issue(s) this PR fixes:

Fixes #118823

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Revised the logic for DaemonSet rolling update to exclude nodes if scheduling constraints are not met.
This eliminates the problem of rolling updates to a DaemonSet getting stuck around tolerations.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 14, 2023
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 14, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @mochizuki875. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mochizuki875
Copy link
Member Author

/sig apps
/cc @atiratree

pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
pkg/controller/daemon/update_test.go Show resolved Hide resolved
@atiratree
Copy link
Member

/ok-to-test
/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 24, 2023
@atiratree
Copy link
Member

@mochizuki875

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 9, 2023
@mochizuki875 mochizuki875 force-pushed the fix_ds_rolling_update_118823 branch 2 times, most recently from f113af4 to decc495 Compare August 9, 2023 07:23
@mochizuki875
Copy link
Member Author

/retest

pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
pkg/controller/daemon/update.go Outdated Show resolved Hide resolved
@mochizuki875
Copy link
Member Author

/retest

1 similar comment
@candita
Copy link

candita commented Sep 19, 2023

/retest

@candita
Copy link

candita commented Sep 19, 2023

/test pull-kubernetes-e2e-gce

@mochizuki875
Copy link
Member Author

/retest

1 similar comment
@mochizuki875
Copy link
Member Author

/retest

@atiratree
Copy link
Member

@mochizuki875 would you mind cherry-picking this into 1.28, 1.27, 1.26 ,1.25? We have promoted the maxSurge feature to GA in 1.25 so it would be great to get the backports into all of these versions (#111194).

@mochizuki875
Copy link
Member Author

@atiratree
Thank you for your notification!
I'll do it:)

@atiratree
Copy link
Member

thanks!

@mochizuki875
Copy link
Member Author

@atiratree
OK, I've created PR for cherry-picking into four versions.

@atiratree
Copy link
Member

thanks, let's wait for the tests and then we can tag them

k8s-ci-robot added a commit that referenced this pull request Sep 22, 2023
…-#119317-upstream-release-1.26

Automated cherry pick of #119317: change rolling update logic to exclude sunsetting nodes
k8s-ci-robot added a commit that referenced this pull request Sep 22, 2023
…-#119317-upstream-release-1.28

Automated cherry pick of #119317: change rolling update logic to exclude sunsetting nodes
k8s-ci-robot added a commit that referenced this pull request Sep 22, 2023
…-#119317-upstream-release-1.27

Automated cherry pick of #119317: change rolling update logic to exclude sunsetting nodes
k8s-ci-robot added a commit that referenced this pull request Sep 22, 2023
…-#119317-upstream-release-1.25

Automated cherry pick of #119317: change rolling update logic to exclude sunsetting nodes
@sftim
Copy link
Contributor

sftim commented Sep 26, 2023

Changelog suggestion

Revised the logic for DaemonSet rolling update to exclude nodes if scheduling constraints are not met.
This eliminates the problem of rolling updates to a DaemonSet getting stuck around tolerations.

Further polish is welcome; I don't think I really understand the effect of the change from the current changelog entry.

@fsmunoz
Copy link
Contributor

fsmunoz commented Sep 26, 2023

@sftim I'm picking your suggestion for the alpha.1 draft PR, I was having some difficulties as well (speaking as Release Notes lead for the 1.29 release).

@mochizuki875
Copy link
Member Author

mochizuki875 commented Sep 26, 2023

@sftim
Thank you for your suggestion! (Sorry, I'm not very good at English...:(
That's make sense and I've fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DaemonSet fails to scale down during the rolling update when maxUnavailable=0
7 participants