For a CronJob when I first apply my manifest, GKE Autopilot mutates the result and applies the minimum requirements properly. However when I go to kubect apply again, it fails with a validation failure.
Is there a way to work around this?
The manifest I'm applying is:
---
apiVersion: v1
kind: Namespace
metadata:
name: "gke-val-fail"
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: run-dummy
namespace: "gke-val-fail"
spec:
concurrencyPolicy: Forbid
schedule: 0 0 21 2 0
jobTemplate:
spec:
template:
metadata:
name: dummy
spec:
restartPolicy: Never
containers:
- name: dummy
image:
hello-world:latest
imagePullPolicy: Always
resources:
requests:
cpu: 200m
memory: 1024Mi
What error are you seeing?
Sorry! Here is the output of the two kubectl apply commands with the second failing. (with my e-mail redacted):
$ kubectl apply -f cronjob.yml
namespace/gke-val-fail created
Warning: autopilot-default-resources-mutator:Autopilot updated CronJob gke-val-fail/run-dummy: defaulted unspecified resources for containers [dummy] (see http://g.co/gke/autopilot-defaults)
cronjob.batch/run-dummy created
$ kubectl apply -f cronjob.yml
namespace/gke-val-fail unchanged
Error from server (GKE Warden constraints violations): error when applying patch:
{"spec":{"jobTemplate":{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"dummy"}],"containers":[{"name":"dummy","resources":{"requests":{"cpu":"200m","memory":"1024Mi"}}}]}}}}}}
to:
Resource: "batch/v1, Resource=cronjobs", GroupVersionKind: "batch/v1, Kind=CronJob"
Name: "run-dummy", Namespace: "gke-val-fail"
for: "cronjob.yml": error when patching "cronjob.yml": admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: {"[denied by autogke-pod-limit-constraints]":["container 'dummy' does not have resource==limits which required in Autopilot clusters.","Total cpu requests for workload 'run-dummy' is lower than the Autopilot minimum required of '250m'."]}
Requested by user: '***REDACTED***', groups: 'system:authenticated'.
$
Is the manifest in your post the one that you applied in the first command (which then got mutated successfully)?
I think that the issue might be that you're applying the old, pre-mutation manifest again while the CronJob is currently running, and GKE isn't letting you patch the existing CronJob with "bad" resource requests/limits
Interesting. What makes the CronJob different from a Deployment?
If I apply a deployment with the same pod spec resources once to create and a second time to patch, it just amends the second request to minimums with the expected warnings.
$ kubectl apply -f deployment.yml namespace/gke-val-fail created Warning: autopilot-default-resources-mutator:Autopilot updated Deployment gke-val-fail/deploy-dummy: defaulted unspecified resources for containers [nginx] (see http://g.co/gke/autopilot-defaults) deployment.apps/deploy-dummy created $ kubectl apply -f deployment.yml namespace/gke-val-fail unchanged Warning: autopilot-default-resources-mutator:Autopilot updated Deployment gke-val-fail/deploy-dummy: adjusted resources to meet requirements for containers [nginx] (see http://g.co/gke/autopilot-resources) deployment.apps/deploy-dummy configured $
--- apiVersion: v1 kind: Namespace metadata: name: "gke-val-fail" --- apiVersion: apps/v1 kind: Deployment metadata: name: deploy-dummy namespace: "gke-val-fail" spec: replicas: 0 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.14.2 ports: resources: requests: cpu: 200m memory: 1024Mi
I'm not sure tbh. What happens if you try to deploy the cronjob with `limits` specified equal to `requests`?
So both including an additional limits section and dropping the requests in favor of limits both fail in different ways.
There are two failing scenarios below, the first one has both limits and requests, and the second one just has limits.
$ kubectl apply -f cronjob-add-limits.yml namespace/gke-val-fail created Warning: autopilot-default-resources-mutator:Autopilot updated CronJob gke-val-fail/run-dummy: adjusted resources to meet requirements for containers [dummy] (see http://g.co/gke/autopilot-resources) cronjob.batch/run-dummy created $ kubectl apply -f cronjob-add-limits.yml namespace/gke-val-fail unchanged Error from server (GKE Warden constraints violations): error when applying patch: {"spec":{"jobTemplate":{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"dummy"}],"containers":[{"name":"dummy","resources":{"limits":{"cpu":"200m","memory":"1024Mi"},"requests":{"cpu":"200m","memory":"1024Mi"}}}]}}}}}} to: Resource: "batch/v1, Resource=cronjobs", GroupVersionKind: "batch/v1, Kind=CronJob" Name: "run-dummy", Namespace: "gke-val-fail" for: "cronjob-add-limits.yml": error when patching "cronjob-add-limits.yml": admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints. Violations details: {"[denied by autogke-pod-limit-constraints]":["Total cpu requests for workload 'run-dummy' is lower than the Autopilot minimum required of '250m'."]} Requested by user: '***REDACTED***', groups: 'system:authenticated'. $
--- apiVersion: v1 kind: Namespace metadata: name: "gke-val-fail" --- apiVersion: batch/v1 kind: CronJob metadata: name: run-dummy namespace: "gke-val-fail" spec: concurrencyPolicy: Forbid schedule: 0 0 21 2 0 jobTemplate: spec: template: metadata: name: dummy spec: restartPolicy: Never containers: - name: dummy image: hello-world:latest imagePullPolicy: Always resources: requests: cpu: 200m memory: 1024Mi limits: cpu: 200m memory: 1024Mi
$ kubectl apply -f cronjob-limits.yml namespace/gke-val-fail created Warning: autopilot-default-resources-mutator:Autopilot updated CronJob gke-val-fail/run-dummy: defaulted unspecified resources for containers [dummy] (see http://g.co/gke/autopilot-defaults) cronjob.batch/run-dummy created $ kubectl apply -f cronjob-limits.yml namespace/gke-val-fail unchanged The CronJob "run-dummy" is invalid: spec.jobTemplate.spec.template.spec.containers[0].resources.requests: Invalid value: "250m": must be less than or equal to cpu limit of 200m $
--- apiVersion: v1 kind: Namespace metadata: name: "gke-val-fail" --- apiVersion: batch/v1 kind: CronJob metadata: name: run-dummy namespace: "gke-val-fail" spec: concurrencyPolicy: Forbid schedule: 0 0 21 2 0 jobTemplate: spec: template: metadata: name: dummy spec: restartPolicy: Never containers: - name: dummy image: hello-world:latest imagePullPolicy: Always resources: limits: cpu: 200m memory: 1024Mi
Interesting. I'm sorry if I'm basically trial and erroring here, I'm not familiar with this aspect of autopilot's validation!
When autopilot mutated your cronjob, what did it mutate the resource values to?
Looks like GKE warden mutates it to:
resources: limits: cpu: 250m ephemeral-storage: 1Gi memory: 1Gi requests: cpu: 250m ephemeral-storage: 1Gi memory: 1Gi
Here is what the original cronjob.yaml turns into with original apply and the same command run again errors as before:
$ kubectl apply -f cronjob.yml -o yaml Warning: autopilot-default-resources-mutator:Autopilot updated CronJob gke-val-fail/run-dummy: defaulted unspecified resources for containers [dummy] (see http://g.co/gke/autopilot-defaults) apiVersion: v1 items: - apiVersion: v1 kind: Namespace metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"name":"gke-val-fail"}} creationTimestamp: "2024-05-23T18:08:05Z" labels: kubernetes.io/metadata.name: gke-val-fail name: gke-val-fail resourceVersion: "69563409" uid: 5795d5a2-a0c3-470e-891a-5176a59e99d0 spec: finalizers: - kubernetes status: phase: Active - apiVersion: batch/v1 kind: CronJob metadata: annotations: autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"requests":{"cpu":"200m","memory":"1Gi"},"name":"dummy"}]},"output":{"containers":[{"limits":{"cpu":"250m","ephemeral-storage":"1Gi","memory":"1Gi"},"requests":{"cpu":"250m","ephemeral-storage":"1Gi","memory":"1Gi"},"name":"dummy"}]},"modified":true}' autopilot.gke.io/warden-version: 2.7.62 kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"batch/v1","kind":"CronJob","metadata":{"annotations":{},"name":"run-dummy","namespace":"gke-val-fail"},"spec":{"concurrencyPolicy":"Forbid","jobTemplate":{"spec":{"template":{"metadata":{"name":"dummy"},"spec":{"containers":[{"image":"hello-world:latest","imagePullPolicy":"Always","name":"dummy","resources":{"requests":{"cpu":"200m","memory":"1024Mi"}}}],"restartPolicy":"Never"}}}},"schedule":"0 0 21 2 0"}} creationTimestamp: "2024-05-23T18:08:06Z" generation: 1 name: run-dummy namespace: gke-val-fail resourceVersion: "69563413" uid: e889a193-01d1-4fb6-81a8-72d291c65aa8 spec: concurrencyPolicy: Forbid failedJobsHistoryLimit: 1 jobTemplate: metadata: creationTimestamp: null spec: template: metadata: creationTimestamp: null name: dummy spec: containers: - image: hello-world:latest imagePullPolicy: Always name: dummy resources: limits: cpu: 250m ephemeral-storage: 1Gi memory: 1Gi requests: cpu: 250m ephemeral-storage: 1Gi memory: 1Gi securityContext: capabilities: drop: - NET_RAW terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Never schedulerName: default-scheduler securityContext: seccompProfile: type: RuntimeDefault terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: kubernetes.io/arch operator: Equal value: amd64 schedule: 0 0 21 2 0 successfulJobsHistoryLimit: 3 suspend: false status: {} kind: List metadata: {} $ kubectl apply -f cronjob.yml -o yaml Error from server (GKE Warden constraints violations): error when applying patch: {"spec":{"jobTemplate":{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"dummy"}],"containers":[{"name":"dummy","resources":{"requests":{"cpu":"200m","memory":"1024Mi"}}}]}}}}}} to: Resource: "batch/v1, Resource=cronjobs", GroupVersionKind: "batch/v1, Kind=CronJob" Name: "run-dummy", Namespace: "gke-val-fail" for: "cronjob.yml": error when patching "cronjob.yml": admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints. Violations details: {"[denied by autogke-pod-limit-constraints]":["container 'dummy' does not have resource==limits which required in Autopilot clusters.","Total cpu requests for workload 'run-dummy' is lower than the Autopilot minimum required of '250m'."]} Requested by user: '***REDACTED***', groups: 'system:authenticated'. $