Open
Description
What would you like to be added?
A mechanism to set a Pod into phase=Failed when the image pull has failed for a number of times (perhaps configurable).
Currently, the Pod just stays in phase=Pending
Why is this needed?
This is especially problematic for Jobs submitted through a queueing system.
In a queued environment, the time when the job starts running (pods created) might be hours or even days after the Job is created. Then, the user that submitted the job might not realize their mistake until it's too late. Since these Pods block resources in the cluster, it might cause other pending Jobs not to start.
If the Pods stay in phase=Pending, the job controller cannot do anything about them, as it only does "failure handling" once the Pods actually terminate with a phase=Failed.
Metadata
Metadata
Assignees
Labels
Categorizes issue or PR as related to a new feature.Indicates an issue or PR lacks a `triage/foo` label and requires one.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.Categorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to WG Batch.