Skip to content

e2e: retry getting status on restart policy tests #132468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

haircommander
Copy link
Contributor

@haircommander haircommander commented Jun 23, 2025

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

as the containers in a pod restart, the pod may enter exponential backoff, which delays the pod being ready. In this case for this pod, it's expected, but we shouldn't fail to continue the test before the container has been given the chance to start again

Which issue(s) this PR is related to:

Fixes #132425

Special notes for your reviewer:

Does this PR introduce a user-facing change?

none

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: haircommander
Once this PR has been reviewed and has the lgtm label, please assign endocrimes for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 23, 2025
as the containers in a pod restart, the pod may enter exponential backoff, which delays the pod being ready.
In this case for this pod, it's expected, but we shouldn't fail to continue the test before the container has been given the chance to start again

Signed-off-by: Peter Hunt <[email protected]>
@haircommander
Copy link
Contributor Author

I gathered this from this sequence:

I0622 19:51:38.675082    4546 status_manager.go:918] "Patch status for pod" pod="container-runtime-2143/terminate-cmd-rpa540523d0-c49c-48ec-8b20-4de41693eebd" podUID="2ffcd915-3267-4893-be48-0144cd5b8b75" patch="{\"metadata\":{\"uid\":\"2ffcd915-3267-4893-be48-0144cd5b8b75\"},\"status\":{\"conditions\":[{\"lastProbeTime\":null,\"lastTransitionTime\":\"2025-06-22T19:51:38Z\",\"status\":\"False\",\"type\":\"PodReadyToStartContainers\"},{\"lastProbeTime\":null,\"lastTransitionTime\":\"2025-06-22T19:51:38Z\",\"status\":\"True\",\"type\":\"Initialized\"},{\"lastProbeTime\":null,\"lastTransitionTime\":\"2025-06-22T19:51:38Z\",\"message\":\"containers with unready status: [terminate-cmd-rpa]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastProbeTime\":null,\"lastTransitionTime\":\"2025-06-22T19:51:38Z\",\"message\":\"containers with unready status: [terminate-cmd-rpa]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"},{\"lastProbeTime\":null,\"lastTransitionTime\":\"2025-06-22T19:51:38Z\",\"status\":\"True\",\"type\":\"PodScheduled\"}],\"containerStatuses\":[{\"image\":\"registry.k8s.io/e2e-test-images/busybox:1.37.0-1\",\"imageID\":\"\",\"lastState\":{},\"name\":\"terminate-cmd-rpa\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"waiting\":{\"reason\":\"ContainerCreating\"}},\"volumeMounts\":[{\"mountPath\":\"/restart-count\",\"name\":\"restart-count\"}]}],\"hostIP\":\"10.128.0.18\",\"hostIPs\":[{\"ip\":\"10.128.0.18\"}],\"startTime\":\"2025-06-22T19:51:38Z\"}}"
I0622 19:52:01.076466    4546 status_manager.go:918] "Patch status for pod" pod="container-runtime-2143/terminate-cmd-rpa540523d0-c49c-48ec-8b20-4de41693eebd" podUID="2ffcd915-3267-4893-be48-0144cd5b8b75" patch="{\"metadata\":{\"uid\":\"2ffcd915-3267-4893-be48-0144cd5b8b75\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"PodReadyToStartContainers\"},{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2025-06-22T19:52:01Z\",\"status\":\"True\",\"type\":\"PodReadyToStartContainers\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://7eb231f8044a6706a38973019669867d7203c45f84acc595becb9a5d24901b71\",\"image\":\"registry.k8s.io/e2e-test-images/busybox:1.37.0-1\",\"imageID\":\"3884f31e8b46c4aaad5bfeecabbdb9f1778bece67df660f5a304518c462e8ede\",\"lastState\":{},\"name\":\"terminate-cmd-rpa\",\"ready\":false,\"resources\":{},\"restartCount\":0,\"started\":false,\"state\":{\"waiting\":{\"message\":\"error reading from server: read unix @-\\u003e/var/run/crio/crio.sock: read: connection reset by peer\",\"reason\":\"RunContainerError\"}},\"user\":{\"linux\":{\"gid\":0,\"supplementalGroups\":[0,10],\"uid\":0}},\"volumeMounts\":[{\"mountPath\":\"/restart-count\",\"name\":\"restart-count\"}]}],\"podIP\":\"10.85.0.2\",\"podIPs\":[{\"ip\":\"10.85.0.2\"},{\"ip\":\"1100:200::2\"}]}}"
I0622 19:52:12.228994    4546 status_manager.go:918] "Patch status for pod" pod="container-runtime-2143/terminate-cmd-rpa540523d0-c49c-48ec-8b20-4de41693eebd" podUID="2ffcd915-3267-4893-be48-0144cd5b8b75" patch="{\"metadata\":{\"uid\":\"2ffcd915-3267-4893-be48-0144cd5b8b75\"},\"status\":{\"containerStatuses\":[{\"containerID\":\"cri-o://7eb231f8044a6706a38973019669867d7203c45f84acc595becb9a5d24901b71\",\"image\":\"registry.k8s.io/e2e-test-images/busybox:1.37.0-1\",\"imageID\":\"3884f31e8b46c4aaad5bfeecabbdb9f1778bece67df660f5a304518c462e8ede\",\"lastState\":{},\"name\":\"terminate-cmd-rpa\",\"ready\":false,\"resources\":{},\"restartCount\":0,\"started\":false,\"state\":{\"waiting\":{\"message\":\"Failed to inspect image \\\"\\\": rpc error: code = Unavailable desc = connection error: desc = \\\"transport: Error while dialing: dial unix /var/run/crio/crio.sock: connect: connection refused\\\"\",\"reason\":\"ImageInspectError\"}},\"user\":{\"linux\":{\"gid\":0,\"supplementalGroups\":[0,10],\"uid\":0}},\"volumeMounts\":[{\"mountPath\":\"/restart-count\",\"name\":\"restart-count\"}]}]}}"
I0622 19:52:13.645138    4546 status_manager.go:918] "Patch status for pod" pod="container-runtime-2143/terminate-cmd-rpa540523d0-c49c-48ec-8b20-4de41693eebd" podUID="2ffcd915-3267-4893-be48-0144cd5b8b75" patch="{\"metadata\":{\"uid\":\"2ffcd915-3267-4893-be48-0144cd5b8b75\"},\"status\":{\"containerStatuses\":[{\"containerID\":\"cri-o://8e9444bb32328ae976a576c33c3f17850b42e9e7814d449f395cb19932d20a23\",\"image\":\"registry.k8s.io/e2e-test-images/busybox:1.37.0-1\",\"imageID\":\"registry.k8s.io/e2e-test-images/busybox@sha256:0ffbe172f8d245c83f285c6992b452c53d085661e03ddfd3b484332026e6c8bb\",\"lastState\":{\"waiting\":{}},\"name\":\"terminate-cmd-rpa\",\"ready\":false,\"resources\":{},\"restartCount\":1,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://8e9444bb32328ae976a576c33c3f17850b42e9e7814d449f395cb19932d20a23\",\"exitCode\":1,\"finishedAt\":\"2025-06-22T19:52:12Z\",\"reason\":\"Error\",\"startedAt\":\"2025-06-22T19:52:12Z\"}},\"user\":{\"linux\":{\"gid\":0,\"supplementalGroups\":[0,10],\"uid\":0}},\"volumeMounts\":[{\"mountPath\":\"/restart-count\",\"name\":\"restart-count\"}]}],\"phase\":\"Running\"}}"
I0622 19:52:14.713177    4546 status_manager.go:918] "Patch status for pod" pod="container-runtime-2143/terminate-cmd-rpa540523d0-c49c-48ec-8b20-4de41693eebd" podUID="2ffcd915-3267-4893-be48-0144cd5b8b75" patch="{\"metadata\":{\"uid\":\"2ffcd915-3267-4893-be48-0144cd5b8b75\"},\"status\":{\"containerStatuses\":[{\"containerID\":\"cri-o://e8c7a4ea0110e7be110287b8028f03f20783e32ff3e7ed5712f21f111122dfea\",\"image\":\"registry.k8s.io/e2e-test-images/busybox:1.37.0-1\",\"imageID\":\"registry.k8s.io/e2e-test-images/busybox@sha256:0ffbe172f8d245c83f285c6992b452c53d085661e03ddfd3b484332026e6c8bb\",\"lastState\":{\"terminated\":{\"containerID\":\"cri-o://8e9444bb32328ae976a576c33c3f17850b42e9e7814d449f395cb19932d20a23\",\"exitCode\":1,\"finishedAt\":\"2025-06-22T19:52:12Z\",\"reason\":\"Error\",\"startedAt\":\"2025-06-22T19:52:12Z\"}},\"name\":\"terminate-cmd-rpa\",\"ready\":false,\"resources\":{},\"restartCount\":2,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://e8c7a4ea0110e7be110287b8028f03f20783e32ff3e7ed5712f21f111122dfea\",\"exitCode\":0,\"finishedAt\":\"2025-06-22T19:52:13Z\",\"reason\":\"Completed\",\"startedAt\":\"2025-06-22T19:52:13Z\"}},\"user\":{\"linux\":{\"gid\":0,\"supplementalGroups\":[0,10],\"uid\":0}},\"volumeMounts\":[{\"mountPath\":\"/restart-count\",\"name\":\"restart-count\"}]}]}}"

source

the unix socket errors are weird and could take some investigation but I think the point stands that we expect this pod to become ready soon

@bart0sh
Copy link
Contributor

bart0sh commented Jun 24, 2025

/retest
/triage accepted
/lgtm

/assign @SergeyKanzhelev @mrunalp
for approval

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 24, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 24, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: bc489081662fb0c07d58ee0762b827ac3a271798

@bart0sh bart0sh moved this from Triage to Needs Approver in SIG Node: code and documentation PRs Jun 24, 2025
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to PRs - Needs Reviewer in SIG Node CI/Test Board Jun 25, 2025
@SergeyKanzhelev SergeyKanzhelev moved this from PRs - Needs Reviewer to PRs Waiting on Author in SIG Node CI/Test Board Jun 25, 2025
@SergeyKanzhelev SergeyKanzhelev moved this from PRs Waiting on Author to PRs - Needs Approver in SIG Node CI/Test Board Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: PRs - Needs Approver
5 participants