Skip to content

order sandbox by attempt or create time #130551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

yylt
Copy link
Contributor

@yylt yylt commented Mar 4, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This issue is caused by system clock rollback, which is prone to occur during system startup, especially when NTP starts around static pods.

Detailed Reproduction Steps:

  1. Set up a node: Install a Kubernetes node using kubeadm and cri-o. Refer to the tutorial: https://github.com/cri-o/cri-o/blob/main/tutorials/kubeadm.md
  2. Stop NTP Service: Stop the NTP service (or chrony, systemd-timesyncd) on the node.
    systemctl stop ntp  // or chrony, systemd-timesyncd
  3. Create a Busybox Deployment: Deploy a simple busybox deployment in the Kubernetes cluster.
    kubectl create -f busybox-deploy
  4. Simulate Clock Rollback: Set the system date back to the epoch start (or a very early time) to simulate a clock rollback.
    date -s '0:0:0'
  5. Stop the Busybox Pod (Initial Stop): Stop the busybox pod using crictl.
    crictl stopp $(crictl pods |grep -w busybox|awk '{print $1}')
  6. Start NTP Service: Start the NTP service (or chrony, systemd-timesyncd) on the node.
    systemctl start ntp  // or chrony, systemd-timesyncd
  7. Stop the Busybox Pod (Second Stop): Stop the busybox pod again using crictl.
    crictl stopp $(crictl pods |grep -w busybox|awk '{print $1}')
  8. Log crio sevice
    journalctl -eu crio --no-pager -ocat
    ...
    time="2025-03-11T04:48:10.760615237Z" level=info msg="Running pod sandbox: default/busybox-74c5fcfb8-5kbmh/POD" id=07cdd735-da9d-4217-8987-bbac8656f566 name=/runtime.v1.RuntimeService/RunPodSandbox
    time="2025-03-11T04:48:10.76067006Z" level=warning msg="error reserving pod name k8s_busybox-74c5fcfb8-5kbmh_default_240ce90c-efe1-432b-aae0-88006b9ff369_2 for id 5d0d17bf66724375923d4467c4869bc822a6151195573e9e3afb418b0f302066: name is reserved
    ...

Making kubelet dependent on time services like NTP during system startup is unnecessary.

It's would be a more convenient and robust approach if kubelet use logical time to determine the number of sandbox creations

This could potentially mitigate issues arising from clock rollback, especially in scenarios where NTP synchronization is delayed or fails during early system boot.

Which issue(s) this PR fixes:

Fixes #126514

Special notes for your reviewer:

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 4, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 4, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @yylt. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 4, 2025
@k8s-ci-robot k8s-ci-robot requested review from rphillips and tzneal March 4, 2025 05:49
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 4, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yylt
Once this PR has been reviewed and has the lgtm label, please assign dchen1107 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yylt
Copy link
Contributor Author

yylt commented Mar 4, 2025

@k8s-ci-robot k8s-ci-robot requested a review from feiskyer March 4, 2025 05:57
@pacoxu
Copy link
Member

pacoxu commented Mar 10, 2025

/cc @HirazawaUi @hshiina

@k8s-ci-robot k8s-ci-robot requested a review from HirazawaUi March 10, 2025 06:41
@k8s-ci-robot
Copy link
Contributor

@pacoxu: GitHub didn't allow me to request PR reviews from the following users: hshiina.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @HirazawaUi @hshiina

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 10, 2025
if p[i].Metadata == nil || p[j].Metadata == nil {
return p[i].CreatedAt > p[j].CreatedAt
}
return p[i].Metadata.Attempt > p[j].Metadata.Attempt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue appears to be caused by containerd, and this PR aims to implement some defensive programming in kubelet to address the bug in containerd.

At first glance, this seems like a positive change: when the physical clock is unreliable, it replaces the absolute time order with a logical incremental relationship.

I would like to raise a question here:

Both #126514 and containerd/containerd#9459 seem to be atypical issues.

That is, when an exception occurs here and reports error="failed to reserve sandbox name", there are already issues elsewhere before this point.

Could this fix potentially mask the real underlying problems? If it does mask the true errors, will users still have other means to troubleshoot this exception after we've obscured it?

I also have another question: Can this issue be reproduced on CRIO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES, also happend on CRIO

This is a system time issue. If NTP synchronizes the wrong time, causing time rollback, this problem will occur

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned earlier, the repeated creation of sandbox containers by kubelet is atypical behavior. This could be caused by:

  1. Race conditions in kubelet and rapid pod lifecycle changes leading to duplicate sandbox creation (this would be a bug).

  2. Containerd failing to create sandbox containers due to resource constraints, or sandbox containers entering erroneous states for other reasons.

For 2, we have already implemented some defensive programming in kubelet. We now sort the list of retrieved sandbox containers by creation time and select the first one.

Currently, kubelet strictly increments the Attempt value each time it tries to create a new sandbox for the same Pod (starting at 0, incrementing to 1 on retry after failure, etc.). This Attempt value is persisted and retained even across node reboots or containerd crashes, and continues to increment. Its value does not depend on system time, it is solely tied to the actual number of sandbox creation attempts.

From a defensive programming perspective, there appears to be no reason to reject this PR. In distributed systems, logical increment-based ordering is more reliable than absolute time order. Using the Attempt value as the sorting criterion is therefore justified.

cc @yujuhong @SergeyKanzhelev @tallclair sig-node maintainers for more accurate and comprehensive insights.

@HirazawaUi
Copy link
Contributor

A minor suggestion: You could include an explanation in the PR description about why this PR is necessary. Otherwise, reviewers will need to start from scratch to understand the purpose of this PR and trace back its context.

You can add this information under the What this PR does / why we need it: section.

Thank you!

@yylt
Copy link
Contributor Author

yylt commented Mar 11, 2025

You can add this information under the What this PR does / why we need it: section.

Thanks suggestion, had add more info.

@HirazawaUi
Copy link
Contributor

/release-note-none

As I mentioned in the comment, this is a defensive programming measure to prevent atypical issues, so it shouldn't be considered a bug.

/remove-kind bug
/kind feature

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. labels Mar 16, 2025
@bart0sh bart0sh moved this from Triage to Waiting on Author in SIG Node: code and documentation PRs Mar 19, 2025
@bart0sh
Copy link
Contributor

bart0sh commented Mar 19, 2025

@yylt please address review comments or close the PR, thanks!
BTW, the issue this PR is trying to fix is closed. Please, reopen if still relevant.

@yylt
Copy link
Contributor Author

yylt commented Mar 19, 2025

As I mentioned in the comment, this is a defensive programming measure to prevent atypical issues, so it shouldn't be considered a bug.

It seems difficult to define, but if the timesync(ntp) service is not ready and the kubelet starts the pod, it will trigger this type of error, and the reason for the error is the use of physical time sorting, which is the root cause of the error, so it is a bug worthy type.

Also, I don't have permission to open #126514 here

@XmchxUp
Copy link

XmchxUp commented Mar 27, 2025

I think the containerStatusByCreated here also needs to be modified, and the struct name should be renamed to xxxByAttemptAndCreated as well.

@yylt
Copy link
Contributor Author

yylt commented Mar 28, 2025

I think the containerStatusByCreated here also needs to be modified, and the struct name should be renamed to xxxByAttemptAndCreated as well.

There is a binding relationship between container and sandbox.
Based on the cases that have occurred, the only reason for the failure is due to sandbox, and it cannot be automatically restored

@yylt
Copy link
Contributor Author

yylt commented Mar 30, 2025

cc @hshiina

Is there anything I can do at this point to help? Or should I wait for other maintainers?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
Status: Waiting on Author
Development

Successfully merging this pull request may close these issues.

You have to remove that sandbox to be able to reuse that name.
8 participants