-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend the Job API for BackoffLimitPerIndex #119294
Extend the Job API for BackoffLimitPerIndex #119294
Conversation
/assign @alculquicondor @deads2k |
/triage accepted |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
api, validation, and gating all lgtm /lgtm |
LGTM label has been added. Git tree hash: 169a5990d2b6ed39b0704823cb0db960a1055b61
|
} | ||
if oldJob.Spec.MaxFailedIndexes == nil { | ||
newJob.Spec.MaxFailedIndexes = nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to check here that there are no new usages of the enum.
The check is simple: if FailIndex was used in the old object, you preserve all the rules. Otherwise, cleanup the rules as you did for PrepareForCreate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
to address this comment yet. FYI @deads2k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alculquicondor pod failure policy is immutable, so IIUC I cannot drop the rules on update. Now, there are two cases:
- if the oldJob didn't contain FailIndex, the newJob also cannot, because the pod failure policy is immutable.
- if the oldJob containerd FailIndex, then it also had to have BackoffLimitPerIndex, so the newJob also has BackoffLimitPerIndex. Because the new job has BackoffLimitPerIndex, the rules will pass validation.
So, I think it is ok, but a little convoluted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh in that case just leave a comment for why we don't treat the failure policies on update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks for the ping, done. Not sure the robot will pick this up. |
/test pull-kubernetes-e2e-kind-ipv6 |
It did! |
/lgtm |
LGTM label has been added. Git tree hash: bcea6692aaa645d35fd44227d3c605e9486597f7
|
@@ -38,7 +40,11 @@ func SetDefaults_Job(obj *batchv1.Job) { | |||
obj.Spec.Parallelism = utilpointer.Int32(1) | |||
} | |||
if obj.Spec.BackoffLimit == nil { | |||
obj.Spec.BackoffLimit = utilpointer.Int32(6) | |||
if obj.Spec.BackoffLimitPerIndex != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k and @liggitt pointed out in my review that we should have conditional defaulting for alpha features.
I guarded this under a feature flag in https://github.com/kubernetes/kubernetes/pull/119301/files#diff-850ee9a00a92fc90e918491d581eac3c5b08208c8f6d9f88fa6a8cddee384fda so maybe that is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the difference between backoffLimitPerIndex
and podRecreationPolicy
is that, a value for podRecreationPolicy
is required when the feature is enabled. Whereas, for backoffLimitPerIndex, the feature is opt-in, so even if the feature gate is enabled, leaving .spec.backoffLimitPerIndex=nil
is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we don't default backoffLimitPerIndex
, we default backoffLimit
to max int32, when a user specified backoffLimitPerIndex
only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if this feature gets turned off, we would default to utilpointer.Int32(math.MaxInt32)
even if the feature was disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not a new field, it's acceptable.
a7008a1
to
3557fec
Compare
@deads2k @alculquicondor there is a change to the annotations as we need a separate counter for pod failures which are excluded from backoff limit, but counted in the backoff delay. As indicated there: #118009 (comment) I hesitate between two approaches, but leaning towards a dedicated annotation, as the API seems cleaner. |
/retest |
23192cc
to
cf0b747
Compare
additional annotation lgtm /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
LGTM label has been added. Git tree hash: 63f8e394d3cceab4f9b43838e0289c42b8188915
|
@mimowo: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test pull-kubernetes-unit |
Implementation PR approved |
What type of PR is this?
/kind documentation
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Tracking Issue: kubernetes/enhancements#3850
Special notes for your reviewer:
This is API extracted of the PR where it was reviewed and coupled to the Job controller changes: #118009
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: