Skip to content

Scheduling: Retry on conflict while updating pvc or pv in VolumeBinding plugin #125338

@NoicFank

Description

@NoicFank

What would you like to be added?

For VolumeBinding plugin, I hope to retry on conflict while updating pvc or pv in perBind stage.

Due to there is no API rollback if the actual updating fails, so if one pods with multi pvcs, it could happen that only parts of pvcs updated with pod re-scheduled, which would leads pods stuck in pending forever (see Example for details) .

This issue wants to help pod to be bound successfully, while greatly reducing the situation that the pods have been scheduled in cache, but the final binding fails due to update pvc/pv conflicts.

@Huang-Wei @cofyc @kerthcet PTAL, thanks.

/sig scheduling

Why is this needed?

currently, once the pvc / pv is out of date, the updating operation will failed, then pod will be scheduled failed in perBind stage fro this round, and backoff to re-schedule.

for most situation, current impl looks fine. But When one pod with multi pvcs, it could let pod stuck in pending state forever for some cases.

Example

There are only :

  • two nodes: nodeA & nodeB,
  • two pods: podA & podB, podA with pvcA1 & pvcA2, podB with pvcB1 & pvcB2.
    We want those two pods to be deployed on different nodes with TopologySpreadConstraints.
    image

Then:

  • scheduling podA, it assumed to nodeA in cache, but failed in in VolumeBinding plugin perBind stage, and the annotations on pvcA1 updated successfully, updating pvcA2 failed with conflict.
  • scheduling podB, it successfully bind to nodeA. Meanwhile, pvcA1 is bound.
  • re-scheduling podA, TopologySpreadConstraints wants podA to nodeB, But the node affinity of pv for pvcA1 wants podA to nodeA.
    Finally, podB stuck in pending forever, unless we delete pvcA1.

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions