Skip to content

Fix:Static pod status is always Init:0/1 if unable to get init container status #131317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bitoku
Copy link
Contributor

@bitoku bitoku commented Apr 15, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR is recreation of old PR #122897 and #108583

Since static pod has no podstatus saved except in mirror pod, so while init container GC'd and kubelet restarts, static pod can only generate its pod status from current runtime, but after #96572 merged, init containers would not be re-created again, so we should reset init-container status for static pod correctly.

Which issue(s) this PR fixes:

Fixes #108537

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix static pod status is always Init:0/1 if unable to get init container status from container runtime.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 15, 2025
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.33 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.33.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Tue Apr 15 13:35:16 UTC 2025.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 15, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @bitoku. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 15, 2025
@HirazawaUi
Copy link
Contributor

Do we have a reliable way to reproduce this issue and present the reproduction process in an e2e test case? Since this is a bug fix, we need to ensure that we’ve fixed it and that future changes won’t break this fix.

@bitoku
Copy link
Contributor Author

bitoku commented Apr 17, 2025

@HirazawaUi

We have a reliable way to reproduce the issue.

#108537

How can we reproduce it (as minimally and precisely as possible)?
remove exited init contianer of one static pod.
restart kubelet

But it needs kubelet restart and direct container manipulation. Do we have the existing tests that do those operations?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bitoku
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Apr 18, 2025
@HirazawaUi
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Apr 21, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 21, 2025
@HirazawaUi
Copy link
Contributor

/test?

@k8s-ci-robot
Copy link
Contributor

@HirazawaUi: The following commands are available to trigger required jobs:

/test pull-cos-containerd-e2e-ubuntu-gce
/test pull-kubernetes-cmd
/test pull-kubernetes-cmd-canary
/test pull-kubernetes-cmd-go-canary
/test pull-kubernetes-conformance-kind-ga-only-parallel
/test pull-kubernetes-coverage-unit
/test pull-kubernetes-dependencies
/test pull-kubernetes-dependencies-go-canary
/test pull-kubernetes-e2e-gce
/test pull-kubernetes-e2e-gce-100-performance
/test pull-kubernetes-e2e-gce-cos
/test pull-kubernetes-e2e-gce-cos-canary
/test pull-kubernetes-e2e-gce-cos-no-stage
/test pull-kubernetes-e2e-gce-network-proxy-http-connect
/test pull-kubernetes-e2e-gce-pull-through-cache
/test pull-kubernetes-e2e-gce-scale-performance-manual
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-integration
/test pull-kubernetes-integration-canary
/test pull-kubernetes-integration-go-canary
/test pull-kubernetes-kubemark-e2e-gce-scale
/test pull-kubernetes-node-e2e-containerd
/test pull-kubernetes-typecheck
/test pull-kubernetes-unit
/test pull-kubernetes-unit-go-canary
/test pull-kubernetes-update
/test pull-kubernetes-verify
/test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

/test check-dependency-stats
/test pull-crio-cgroupv1-node-e2e-eviction
/test pull-crio-cgroupv1-node-e2e-features
/test pull-crio-cgroupv1-node-e2e-hugepages
/test pull-crio-cgroupv1-node-e2e-resource-managers
/test pull-crio-cgroupv2-imagefs-separatedisktest
/test pull-crio-cgroupv2-node-e2e-eviction
/test pull-crio-cgroupv2-node-e2e-hugepages
/test pull-crio-cgroupv2-node-e2e-resource-managers
/test pull-crio-cgroupv2-splitfs-separate-disk
/test pull-e2e-gce-cloud-provider-disabled
/test pull-e2e-gci-gce-alpha-enabled-default
/test pull-kubernetes-apidiff
/test pull-kubernetes-apidiff-client-go
/test pull-kubernetes-conformance-image-test
/test pull-kubernetes-conformance-kind-ga-only
/test pull-kubernetes-conformance-kind-ipv6-parallel
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-crio-node-memoryqos-cgrpv2
/test pull-kubernetes-cross
/test pull-kubernetes-e2e-autoscaling-hpa-cm
/test pull-kubernetes-e2e-autoscaling-hpa-cpu
/test pull-kubernetes-e2e-autoscaling-hpa-cpu-alpha-beta
/test pull-kubernetes-e2e-capz-azure-disk
/test pull-kubernetes-e2e-capz-azure-disk-vmss
/test pull-kubernetes-e2e-capz-azure-disk-windows
/test pull-kubernetes-e2e-capz-azure-file
/test pull-kubernetes-e2e-capz-azure-file-vmss
/test pull-kubernetes-e2e-capz-azure-file-windows
/test pull-kubernetes-e2e-capz-conformance
/test pull-kubernetes-e2e-capz-master-windows-nodelogquery
/test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
/test pull-kubernetes-e2e-capz-windows-alpha-features
/test pull-kubernetes-e2e-capz-windows-master
/test pull-kubernetes-e2e-capz-windows-serial-slow
/test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
/test pull-kubernetes-e2e-containerd-gce
/test pull-kubernetes-e2e-ec2
/test pull-kubernetes-e2e-ec2-arm64
/test pull-kubernetes-e2e-ec2-conformance
/test pull-kubernetes-e2e-ec2-conformance-arm64
/test pull-kubernetes-e2e-ec2-device-plugin-gpu
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-gce-correctness
/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-e2e-gce-csi-serial
/test pull-kubernetes-e2e-gce-device-plugin-gpu
/test pull-kubernetes-e2e-gce-disruptive-canary
/test pull-kubernetes-e2e-gce-kubelet-credential-provider
/test pull-kubernetes-e2e-gce-network-policies
/test pull-kubernetes-e2e-gce-network-proxy-grpc
/test pull-kubernetes-e2e-gce-serial
/test pull-kubernetes-e2e-gce-serial-canary
/test pull-kubernetes-e2e-gce-storage-disruptive
/test pull-kubernetes-e2e-gce-storage-selinux
/test pull-kubernetes-e2e-gce-storage-slow
/test pull-kubernetes-e2e-gce-storage-snapshot
/test pull-kubernetes-e2e-gci-gce-autoscaling
/test pull-kubernetes-e2e-gci-gce-ingress
/test pull-kubernetes-e2e-gci-gce-ipvs
/test pull-kubernetes-e2e-gci-gce-kube-dns-nodecache
/test pull-kubernetes-e2e-gci-gce-nftables
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features
/test pull-kubernetes-e2e-kind-beta-features
/test pull-kubernetes-e2e-kind-canary
/test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer
/test pull-kubernetes-e2e-kind-dual-canary
/test pull-kubernetes-e2e-kind-evented-pleg
/test pull-kubernetes-e2e-kind-ipv6-canary
/test pull-kubernetes-e2e-kind-ipvs
/test pull-kubernetes-e2e-kind-kms
/test pull-kubernetes-e2e-kind-multizone
/test pull-kubernetes-e2e-kind-nftables
/test pull-kubernetes-e2e-relaxed-environment-variable-validation
/test pull-kubernetes-e2e-storage-kind-alpha-beta-features
/test pull-kubernetes-e2e-storage-kind-disruptive
/test pull-kubernetes-e2e-storage-kind-volume-group-snapshots
/test pull-kubernetes-kind-dra
/test pull-kubernetes-kind-dra-all
/test pull-kubernetes-kind-dra-all-canary
/test pull-kubernetes-kind-dra-canary
/test pull-kubernetes-kind-json-logging
/test pull-kubernetes-kind-text-logging
/test pull-kubernetes-kubemark-e2e-gce-big
/test pull-kubernetes-linter-hints
/test pull-kubernetes-local-e2e
/test pull-kubernetes-node-arm64-e2e-containerd-ec2
/test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
/test pull-kubernetes-node-arm64-ubuntu-serial-gce
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv2-e2e
/test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e
/test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e
/test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e
/test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial
/test pull-kubernetes-node-crio-e2e
/test pull-kubernetes-node-e2e-alpha-ec2
/test pull-kubernetes-node-e2e-containerd-1-7-dra
/test pull-kubernetes-node-e2e-containerd-1-7-dra-canary
/test pull-kubernetes-node-e2e-containerd-2-0-dra
/test pull-kubernetes-node-e2e-containerd-2-0-dra-canary
/test pull-kubernetes-node-e2e-containerd-alpha-features
/test pull-kubernetes-node-e2e-containerd-ec2
/test pull-kubernetes-node-e2e-containerd-features
/test pull-kubernetes-node-e2e-containerd-features-kubetest2
/test pull-kubernetes-node-e2e-containerd-kubelet-psi
/test pull-kubernetes-node-e2e-containerd-kubetest2
/test pull-kubernetes-node-e2e-containerd-serial-ec2
/test pull-kubernetes-node-e2e-containerd-serial-ec2-eks
/test pull-kubernetes-node-e2e-containerd-standalone-mode
/test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
/test pull-kubernetes-node-e2e-cri-proxy-serial
/test pull-kubernetes-node-e2e-crio-cgrpv1-dra
/test pull-kubernetes-node-e2e-crio-cgrpv1-dra-canary
/test pull-kubernetes-node-e2e-crio-cgrpv2-dra
/test pull-kubernetes-node-e2e-crio-cgrpv2-dra-canary
/test pull-kubernetes-node-e2e-resource-health-status
/test pull-kubernetes-node-kubelet-containerd-flaky
/test pull-kubernetes-node-kubelet-credential-provider
/test pull-kubernetes-node-kubelet-podresize
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-cpu-manager
/test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
/test pull-kubernetes-node-kubelet-serial-hugepages
/test pull-kubernetes-node-kubelet-serial-memory-manager
/test pull-kubernetes-node-kubelet-serial-podresources
/test pull-kubernetes-node-kubelet-serial-topology-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
/test pull-kubernetes-node-swap-conformance-fedora-serial
/test pull-kubernetes-node-swap-conformance-ubuntu-serial
/test pull-kubernetes-node-swap-fedora
/test pull-kubernetes-node-swap-fedora-serial
/test pull-kubernetes-node-swap-ubuntu-serial
/test pull-kubernetes-scheduler-perf
/test pull-kubernetes-unit-windows-master
/test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

pull-kubernetes-cmd
pull-kubernetes-conformance-kind-ga-only-parallel
pull-kubernetes-dependencies
pull-kubernetes-e2e-ec2
pull-kubernetes-e2e-gce
pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
pull-kubernetes-e2e-kind
pull-kubernetes-e2e-kind-ipv6
pull-kubernetes-integration
pull-kubernetes-linter-hints
pull-kubernetes-node-e2e-containerd
pull-kubernetes-typecheck
pull-kubernetes-unit
pull-kubernetes-unit-windows-master
pull-kubernetes-verify

In response to this:

/test?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@HirazawaUi
Copy link
Contributor

/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2

@bart0sh
Copy link
Contributor

bart0sh commented Apr 22, 2025

/triage accepted

@bitoku please, fix CI failures, thanks!

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 22, 2025
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node: code and documentation PRs Apr 22, 2025
@bitoku
Copy link
Contributor Author

bitoku commented Apr 22, 2025

/retest

1 similar comment
@bitoku
Copy link
Contributor Author

bitoku commented Apr 22, 2025

/retest

@bitoku
Copy link
Contributor Author

bitoku commented Apr 22, 2025

All tests are green. Can you PTAL?

@SergeyKanzhelev
Copy link
Member

/assign @haircommander

@SergeyKanzhelev SergeyKanzhelev moved this from Triage to PRs - Needs Reviewer in SIG Node CI/Test Board Apr 23, 2025
Comment on lines 2323 to 2332
if s == nil && kuberuntime.HasAnyRegularContainerCreated(pod, podStatus) && statuses[container.Name].State.Waiting != nil {
statuses[container.Name].State = v1.ContainerState{
Terminated: &v1.ContainerStateTerminated{
Reason: "Completed",
Message: "Unable to get init container status from container runtime and pod has been initialized, treat it as exited normally",
ExitCode: 0,
},
}
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done only for static pods just to be on the safe side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@bitoku bitoku force-pushed the fix-static-init branch from 23a34e4 to f5ea8ed Compare May 26, 2025 11:01
@bitoku
Copy link
Contributor Author

bitoku commented May 30, 2025

/retest

@bitoku
Copy link
Contributor Author

bitoku commented Jun 4, 2025

/test pull-kubernetes-node-crio-cgrpv2-e2e-canary

1 similar comment
@bitoku
Copy link
Contributor Author

bitoku commented Jun 4, 2025

/test pull-kubernetes-node-crio-cgrpv2-e2e-canary

@bitoku
Copy link
Contributor Author

bitoku commented Jun 4, 2025

/test pull-kubernetes-node-crio-cgrpv2-e2e
/test pull-kubernetes-node-crio-cgrpv2-e2e-canary

…ner status from container runtime.

Signed-off-by: Ayato Tokubi <[email protected]>
@bitoku bitoku force-pushed the fix-static-init branch from f5ea8ed to c0add30 Compare June 10, 2025 03:16
@bitoku
Copy link
Contributor Author

bitoku commented Jun 10, 2025

/test pull-kubernetes-node-crio-cgrpv2-e2e-canary

@bitoku
Copy link
Contributor Author

bitoku commented Jun 10, 2025

/retest

@bitoku
Copy link
Contributor Author

bitoku commented Jun 11, 2025

/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-crio-cgrpv2-e2e

@bitoku
Copy link
Contributor Author

bitoku commented Jun 13, 2025

Hi! I addressed the comments. PTAL.

@bitoku
Copy link
Contributor Author

bitoku commented Jun 24, 2025

/test pull-kubernetes-node-crio-cgrpv2-e2e-canary

@k8s-ci-robot
Copy link
Contributor

@bitoku: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-crio-cgrpv2-e2e c0add30 link false /test pull-kubernetes-node-crio-cgrpv2-e2e
pull-kubernetes-node-kubelet-serial-containerd c0add30 link false /test pull-kubernetes-node-kubelet-serial-containerd
pull-kubernetes-node-crio-cgrpv2-e2e-canary c0add30 link false /test pull-kubernetes-node-crio-cgrpv2-e2e-canary

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: PRs - Needs Reviewer
Development

Successfully merging this pull request may close these issues.

Static pod status is always Init:0/1 when init container GC'd before kubelet restart.
7 participants