Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the Job API for BackoffLimitPerIndex #119294

Merged
merged 2 commits into from
Jul 18, 2023

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Jul 13, 2023

What type of PR is this?

/kind documentation
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Tracking Issue: kubernetes/enhancements#3850

Special notes for your reviewer:

This is API extracted of the PR where it was reviewed and coupled to the Job controller changes: #118009

Does this PR introduce a user-facing change?

Extend the Job API for alpha version of BackoffLimitPerIndex

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/documentation Categorizes issue or PR as related to documentation. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 13, 2023
@k8s-ci-robot k8s-ci-robot added area/code-generation kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 13, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Jul 13, 2023

/assign @alculquicondor @deads2k

@jiahuif
Copy link
Member

jiahuif commented Jul 13, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 13, 2023
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 13, 2023
@deads2k
Copy link
Contributor

deads2k commented Jul 13, 2023

api, validation, and gating all lgtm

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 13, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 169a5990d2b6ed39b0704823cb0db960a1055b61

}
if oldJob.Spec.MaxFailedIndexes == nil {
newJob.Spec.MaxFailedIndexes = nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to check here that there are no new usages of the enum.

The check is simple: if FailIndex was used in the old object, you preserve all the rules. Otherwise, cleanup the rules as you did for PrepareForCreate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
to address this comment yet. FYI @deads2k

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alculquicondor pod failure policy is immutable, so IIUC I cannot drop the rules on update. Now, there are two cases:

  1. if the oldJob didn't contain FailIndex, the newJob also cannot, because the pod failure policy is immutable.
  2. if the oldJob containerd FailIndex, then it also had to have BackoffLimitPerIndex, so the newJob also has BackoffLimitPerIndex. Because the new job has BackoffLimitPerIndex, the rules will pass validation.

So, I think it is ok, but a little convoluted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh in that case just leave a comment for why we don't treat the failure policies on update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 13, 2023
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 14, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Jul 14, 2023

Fixed in #119305

Please rebase

Thanks for the ping, done. Not sure the robot will pick this up.

@mimowo
Copy link
Contributor Author

mimowo commented Jul 14, 2023

/test pull-kubernetes-e2e-kind-ipv6

@alculquicondor
Copy link
Member

It did!
But still on hold for implementation, which I'm looking at next.

@alculquicondor
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 14, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: bcea6692aaa645d35fd44227d3c605e9486597f7

@@ -38,7 +40,11 @@ func SetDefaults_Job(obj *batchv1.Job) {
obj.Spec.Parallelism = utilpointer.Int32(1)
}
if obj.Spec.BackoffLimit == nil {
obj.Spec.BackoffLimit = utilpointer.Int32(6)
if obj.Spec.BackoffLimitPerIndex != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k and @liggitt pointed out in my review that we should have conditional defaulting for alpha features.

I guarded this under a feature flag in https://github.com/kubernetes/kubernetes/pull/119301/files#diff-850ee9a00a92fc90e918491d581eac3c5b08208c8f6d9f88fa6a8cddee384fda so maybe that is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the difference between backoffLimitPerIndex and podRecreationPolicy is that, a value for podRecreationPolicy is required when the feature is enabled. Whereas, for backoffLimitPerIndex, the feature is opt-in, so even if the feature gate is enabled, leaving .spec.backoffLimitPerIndex=nil is supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we don't default backoffLimitPerIndex, we default backoffLimit to max int32, when a user specified backoffLimitPerIndex only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if this feature gets turned off, we would default to utilpointer.Int32(math.MaxInt32) even if the feature was disabled?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not a new field, it's acceptable.

@mimowo mimowo force-pushed the backoff-limit-per-index-api branch from a7008a1 to 3557fec Compare July 18, 2023 09:29
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 18, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Jul 18, 2023

@deads2k @alculquicondor there is a change to the annotations as we need a separate counter for pod failures which are excluded from backoff limit, but counted in the backoff delay. As indicated there: #118009 (comment) I hesitate between two approaches, but leaning towards a dedicated annotation, as the API seems cleaner.

@alculquicondor
Copy link
Member

/retest

pkg/apis/batch/types.go Outdated Show resolved Hide resolved
pkg/apis/batch/types.go Outdated Show resolved Hide resolved
pkg/apis/batch/types.go Outdated Show resolved Hide resolved
@mimowo mimowo force-pushed the backoff-limit-per-index-api branch from 23192cc to cf0b747 Compare July 18, 2023 14:41
@deads2k
Copy link
Contributor

deads2k commented Jul 18, 2023

additional annotation lgtm

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@deads2k
Copy link
Contributor

deads2k commented Jul 18, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 18, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 63f8e394d3cceab4f9b43838e0289c42b8188915

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jul 18, 2023

@mimowo: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce-cos-alpha-features 23192cc60b7af69cf3fdb572157be6246bcd2b3e link false /test pull-kubernetes-e2e-gce-cos-alpha-features

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mimowo
Copy link
Contributor Author

mimowo commented Jul 18, 2023

/test pull-kubernetes-unit

@alculquicondor
Copy link
Member

Implementation PR approved
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 18, 2023
@k8s-ci-robot k8s-ci-robot merged commit f3f5dd9 into kubernetes:master Jul 18, 2023
12 of 13 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Jul 18, 2023
@mimowo mimowo deleted the backoff-limit-per-index-api branch November 29, 2023 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants