-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proceed with deletion of rejected pods after Kubelet restart #118497
Proceed with deletion of rejected pods after Kubelet restart #118497
Conversation
eafe760
to
bbefcb4
Compare
/test pull-kubernetes-e2e-kind-ipv6 |
/triage accepted |
bbefcb4
to
540094c
Compare
339668f
to
17013d3
Compare
/test pull-kubernetes-e2e-gce-serial |
@bobbypage @smarterclayton ready for another pass. I've tested the latest version with 50 repeats of the new tests locally and there were no failures. |
/test pull-kubernetes-e2e-kind-ipv6 |
/test pull-kubernetes-node-kubelet-serial-containerd |
1 similar comment
/test pull-kubernetes-node-kubelet-serial-containerd |
@mimowo: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Looks good to me. I don't have any substantitive comments on the tests, other than "you saw them fail if your fix is commented out"? |
The first test, for the main scenario, fails consistently without the fix. The second, for the non-admissible runtime pod, passes as it was already covered by HandlePodCleanups, but it seems valuable to test this non-obvious modification of the main scenario. |
Also lgtm on the changes, the change to use pod worker update with The only thing I saw when testing out the changes here is there is a ~10s delay for the pod from the pod being rejected at kubelet to the pod being deleted on the apiserver. This is because the following flow happens: (1) The pod is rejected by kubelet at admission and the local status manager is updated with terminal phase IMO, the 10s latency is not big deal... it will reconcile eventually, but if we wanted it to update faster and reduce latency, I think if we also called So maybe long term, we should do both, i.e. call |
/lgtm |
LGTM label has been added. Git tree hash: 9cb115ee3ae7a5f372aa9f07838b0b6508db666c
|
Thanks @bobbypage for the summary. The scenario is very unlikely, so I don't think 10s makes a difference, but it is a judgment call. I have looked if there are other scenarios possible to arrive with pods which satisfy the conditions: not runtime pods, terminal, with deletion timestamp and not in pod workers, but the scenario reported is the only I could come up with. This is probably because So, this is why the fix https://github.com/kubernetes/kubernetes/pull/118614/files (to call SyncPodKIll from rejectPod( should also work. As for which one is better, the margins are small. The other fix reduces latency, but is specific to rejected pods, so potentially there might be a scenario we couldn't think about, which somehow bypasses IIUC if we combine the two fixes, then the code in Let me know what you think, I'm out next week and will be back on the 5th of July. |
Thanks @mimowo for the explanation, +1, I agree this scenario is unlikely to be hit and the 10s latency is not critical, so I'm lgtm to go with this approach. |
Since we plan on pulling admission closer to pod worker anyway, rejection probably becomes a pod worker responsibility and that should fix the issue (but we still need resync for restarts anyway). /lgtm too |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mimowo, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…18497-upstream-release-1.27 Automated cherry pick of #118497: Fix the deletion of rejected pods
Hi Team, Same issue here, may i know the issue fixed in which release? |
This was fixed in 1.28 (this PR), and cherry-picked on 1.27.4 in this PR #118841. See in release notes: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1273 "Fix deletion of non-admissible pods that are deleted during Kubelet restart (#118841, @bobbypage) [SIG Node and Testing]" |
Got it, thanks for your time and have a good day. |
What type of PR is this?
/kind bug
/kind regression
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #118472
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: