Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DisruptionTarget condition when preempting for critical pod #117586

Merged
merged 1 commit into from
May 23, 2023

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Apr 25, 2023

What type of PR is this?

/kind feature
/kind bug

What this PR does / why we need it:

In order to annotate pod disruption caused by preemption initiated by Kubelet to make room for a critical pod.
An analogous scenario of preemption is covered with the DisruptionTarget condition by Kube-scheduler.

Which issue(s) this PR fixes:

Special notes for your reviewer:

The scenario of preemption by Kubelet to make room for a critical pod was overlooked during earlier phases of the development of pod failure policy, so it can be considered a bug.

The test appears stable, repeated over 100 iterations and no failure.

Fixing this issue is covered in the KEP update: kubernetes/enhancements#3965

Does this PR introduce a user-facing change?

Add DisruptionTarget condition to the pod preempted by Kubelet to make room for a critical pod

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures 

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 25, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Apr 25, 2023

/test pull-kubernetes-verify

@mimowo
Copy link
Contributor Author

mimowo commented Apr 25, 2023

/sig node

@mimowo
Copy link
Contributor Author

mimowo commented Apr 25, 2023

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 25, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Apr 25, 2023

@mimowo mimowo changed the title [WIP] Add DisruptionTarget condition when preempting for critical pod Add DisruptionTarget condition when preempting for critical pod Apr 25, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 25, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Apr 25, 2023

Do you think this could go as a bug, bypassing the KEP update phase, and backported to 1.26 & 1.27?

On one hand it looks like a bug (omission of the scenario). On the other hand, because we didn't include the scenario in KEP we may need to first update KEP and then fix it for 1.28.

@mimowo
Copy link
Contributor Author

mimowo commented Apr 26, 2023

Added to the KEP update for 1.28: kubernetes/enhancements#3965 for Third Beta.

@bart0sh
Copy link
Contributor

bart0sh commented Apr 26, 2023

/triage accepted
/priority important-soon
/assign @SergeyKanzhelev @smarterclayton

@mimowo
Copy link
Contributor Author

mimowo commented May 2, 2023

FYI: the pull-kubernetes-node-kubelet-serial-pod-disruption-conditions fails for the PID-based eviction test for a reason not directly related to disruption conditons, but because currently the eviction tests fail: https://testgrid.k8s.io/sig-node-containerd#node-kubelet-containerd-eviction.

Copy link
Member

@endocrimes endocrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh eviction tests, my old "friends".

Looks like the new test passes at least. One small nit on the wording of the condition, otherwise LGTM

pkg/kubelet/preemption/preemption.go Outdated Show resolved Hide resolved
@mimowo mimowo force-pushed the preemption-for-critical-pods branch from 631fc3d to e1e3814 Compare May 2, 2023 16:53
Copy link
Member

@endocrimes endocrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
^ for the test

actual change is also a lgtm

@endocrimes endocrimes moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage May 2, 2023
@mimowo
Copy link
Contributor Author

mimowo commented May 2, 2023

/test pull-kubernetes-e2e-gce
Retest unrelated failure

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 2, 2023

@mimowo: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-kubelet-serial-pod-disruption-conditions faf68967dbf710fea79053550444755050597db6 link false /test pull-kubernetes-node-kubelet-serial-pod-disruption-conditions

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mimowo
Copy link
Contributor Author

mimowo commented May 3, 2023

/test pull-kubernetes-e2e-gce

@pacoxu
Copy link
Member

pacoxu commented May 9, 2023

Added to the KEP update for 1.28: kubernetes/enhancements#3965 for Third Beta.

I wondered if we need to wait for the KEP update to be merged as this PR is an implementation of the KEP update.

@mimowo
Copy link
Contributor Author

mimowo commented May 9, 2023

I wondered if we need to wait for the KEP update to be merged as this PR is an implementation of the KEP update.

I would like to handle it as a bug and merge without the need to wait for the KEP, even if the scenario wasn't listed explicitly in the previous versions of the KEP. Anyway, I prepare the KEP update as well to make sure it aligns with the implementation (or in case we want to do the regular KEP, the implementation steps).

@bobbypage
Copy link
Member

bobbypage commented May 19, 2023

lgtm. I agree that this is much smaller change and can be captured as bug rather then feature (and potentially cherrypicked if needed), since it sounds like preemption by kubelet to admit critical pod was a missed case previously. Would be great to get other folks thoughts on this.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2536e1973b244c65398a45839f1793ec5997be49

@mimowo
Copy link
Contributor Author

mimowo commented May 22, 2023

/assign @SergeyKanzhelev @dchen1107

@dchen1107
Copy link
Member

/lgtm
/approve

I agreed to treat this as a bug of the pod failure policy. Please send us the backport PRs.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, endocrimes, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 23, 2023
@k8s-ci-robot k8s-ci-robot merged commit 8b8dfca into kubernetes:master May 23, 2023
13 checks passed
SIG Node CI/Test Board automation moved this from Archive-it to Done May 23, 2023
SIG Node PR Triage automation moved this from Needs Approver to Done May 23, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone May 23, 2023
k8s-ci-robot added a commit that referenced this pull request Jun 1, 2023
…86-upstream-release-1.26

Automated cherry pick of #117586: Add DisruptionTarget condition when preempting for critical
k8s-ci-robot added a commit that referenced this pull request Jun 1, 2023
…86-upstream-release-1.27

Automated cherry pick of #117586: Add DisruptionTarget condition when preempting for critical
@mimowo mimowo deleted the preemption-for-critical-pods branch November 29, 2023 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

9 participants