kubectl apply of with low resource requests in CronJob on GKE Autopilot fails on second apply.

For a CronJob when I first apply my manifest, GKE Autopilot mutates the result and applies the minimum requirements properly. However when I go to kubect apply again, it fails with a validation failure.

Is there a way to work around this?

The manifest I'm applying is:

 

 

---
apiVersion: v1
kind: Namespace
metadata:
  name: "gke-val-fail"
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: run-dummy
  namespace: "gke-val-fail"
spec:
  concurrencyPolicy: Forbid
  schedule: 0 0 21 2 0
  jobTemplate:
    spec:
      template:
        metadata:
          name: dummy
        spec:
          restartPolicy: Never
          containers:
          - name: dummy
            image:
              hello-world:latest
            imagePullPolicy: Always
            resources:
              requests:
                cpu: 200m
                memory: 1024Mi

 

 

4 8 214
8 REPLIES 8

What error are you seeing? 

Sorry! Here is the output of the two kubectl apply commands with the second failing. (with my e-mail redacted):

$ kubectl apply -f cronjob.yml
namespace/gke-val-fail created
Warning: autopilot-default-resources-mutator:Autopilot updated CronJob gke-val-fail/run-dummy: defaulted unspecified resources for containers [dummy] (see http://g.co/gke/autopilot-defaults)
cronjob.batch/run-dummy created
$ kubectl apply -f cronjob.yml
namespace/gke-val-fail unchanged
Error from server (GKE Warden constraints violations): error when applying patch:
{"spec":{"jobTemplate":{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"dummy"}],"containers":[{"name":"dummy","resources":{"requests":{"cpu":"200m","memory":"1024Mi"}}}]}}}}}}
to:
Resource: "batch/v1, Resource=cronjobs", GroupVersionKind: "batch/v1, Kind=CronJob"
Name: "run-dummy", Namespace: "gke-val-fail"
for: "cronjob.yml": error when patching "cronjob.yml": admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: {"[denied by autogke-pod-limit-constraints]":["container 'dummy' does not have resource==limits which required in Autopilot clusters.","Total cpu requests for workload 'run-dummy' is lower than the Autopilot minimum required of '250m'."]}
Requested by user: '***REDACTED***', groups: 'system:authenticated'.
$

Is the manifest in your post the one that you applied in the first command (which then got mutated successfully)?

I think that the issue might be that you're applying the old, pre-mutation manifest again while the CronJob is currently running, and GKE isn't letting you patch the existing CronJob with "bad" resource requests/limits

Interesting. What makes the CronJob different from a Deployment?

If I apply a deployment with the same pod spec resources once to create and a second time to patch, it just amends the second request to minimums with the expected warnings.

Execution Log:
$ kubectl apply -f deployment.yml 
namespace/gke-val-fail created
Warning: autopilot-default-resources-mutator:Autopilot updated Deployment gke-val-fail/deploy-dummy: defaulted unspecified resources for containers [nginx] (see http://g.co/gke/autopilot-defaults)
deployment.apps/deploy-dummy created
$ kubectl apply -f deployment.yml
namespace/gke-val-fail unchanged
Warning: autopilot-default-resources-mutator:Autopilot updated Deployment gke-val-fail/deploy-dummy: adjusted resources to meet requirements for containers [nginx] (see http://g.co/gke/autopilot-resources)
deployment.apps/deploy-dummy configured
$ 
deployment.yml:
---
apiVersion: v1
kind: Namespace
metadata:
  name: "gke-val-fail"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-dummy
  namespace: "gke-val-fail"
spec:
  replicas: 0
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        resources:
          requests:
            cpu: 200m
            memory: 1024Mi

 

I'm not sure tbh. What happens if you try to deploy the cronjob with `limits` specified equal to `requests`? 

So both including an additional limits section and dropping the requests in favor of limits both fail in different ways.

There are two failing scenarios below, the first one has both limits and requests, and the second one just has limits.

Both Limits and Requests
Limits & Requests Execution Log
$ kubectl apply -f cronjob-add-limits.yml
namespace/gke-val-fail created
Warning: autopilot-default-resources-mutator:Autopilot updated CronJob gke-val-fail/run-dummy: adjusted resources to meet requirements for containers [dummy] (see http://g.co/gke/autopilot-resources)
cronjob.batch/run-dummy created
$ kubectl apply -f cronjob-add-limits.yml
namespace/gke-val-fail unchanged
Error from server (GKE Warden constraints violations): error when applying patch:
{"spec":{"jobTemplate":{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"dummy"}],"containers":[{"name":"dummy","resources":{"limits":{"cpu":"200m","memory":"1024Mi"},"requests":{"cpu":"200m","memory":"1024Mi"}}}]}}}}}}
to:
Resource: "batch/v1, Resource=cronjobs", GroupVersionKind: "batch/v1, Kind=CronJob"
Name: "run-dummy", Namespace: "gke-val-fail"
for: "cronjob-add-limits.yml": error when patching "cronjob-add-limits.yml": admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: {"[denied by autogke-pod-limit-constraints]":["Total cpu requests for workload 'run-dummy' is lower than the Autopilot minimum required of '250m'."]}
Requested by user: '***REDACTED***', groups: 'system:authenticated'.
$
cronjob-add-limits.yml
---
apiVersion: v1
kind: Namespace
metadata:
  name: "gke-val-fail"
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: run-dummy
  namespace: "gke-val-fail"
spec:
  concurrencyPolicy: Forbid
  schedule: 0 0 21 2 0
  jobTemplate:
    spec:
      template:
        metadata:
          name: dummy
        spec:
          restartPolicy: Never
          containers:
          - name: dummy
            image:
              hello-world:latest
            imagePullPolicy: Always
            resources:
              requests:
                cpu: 200m
                memory: 1024Mi
              limits:
                cpu: 200m
                memory: 1024Mi
Just limits, with requests omitted
Execution Log
$ kubectl apply -f cronjob-limits.yml
namespace/gke-val-fail created
Warning: autopilot-default-resources-mutator:Autopilot updated CronJob gke-val-fail/run-dummy: defaulted unspecified resources for containers [dummy] (see http://g.co/gke/autopilot-defaults)
cronjob.batch/run-dummy created
$ kubectl apply -f cronjob-limits.yml
namespace/gke-val-fail unchanged
The CronJob "run-dummy" is invalid: spec.jobTemplate.spec.template.spec.containers[0].resources.requests: Invalid value: "250m": must be less than or equal to cpu limit of 200m
$ 
cronjob-limits.yml
---
apiVersion: v1
kind: Namespace
metadata:
  name: "gke-val-fail"
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: run-dummy
  namespace: "gke-val-fail"
spec:
  concurrencyPolicy: Forbid
  schedule: 0 0 21 2 0
  jobTemplate:
    spec:
      template:
        metadata:
          name: dummy
        spec:
          restartPolicy: Never
          containers:
          - name: dummy
            image:
              hello-world:latest
            imagePullPolicy: Always
            resources:
              limits:
                cpu: 200m
                memory: 1024Mi

Interesting. I'm sorry if I'm basically trial and erroring here, I'm not familiar with this aspect of autopilot's validation! 

When autopilot mutated your cronjob, what did it mutate the resource values to? 

Looks like GKE warden mutates it to:

resources:
  limits:
    cpu: 250m
    ephemeral-storage: 1Gi
    memory: 1Gi
  requests:
    cpu: 250m
    ephemeral-storage: 1Gi
    memory: 1Gi

Here is what the original cronjob.yaml turns into with original apply and the same command run again errors as before:

$ kubectl apply -f cronjob.yml -o yaml
Warning: autopilot-default-resources-mutator:Autopilot updated CronJob gke-val-fail/run-dummy: defaulted unspecified resources for containers [dummy] (see http://g.co/gke/autopilot-defaults)
apiVersion: v1
items:
- apiVersion: v1
  kind: Namespace
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"name":"gke-val-fail"}}
    creationTimestamp: "2024-05-23T18:08:05Z"
    labels:
      kubernetes.io/metadata.name: gke-val-fail
    name: gke-val-fail
    resourceVersion: "69563409"
    uid: 5795d5a2-a0c3-470e-891a-5176a59e99d0
  spec:
    finalizers:
    - kubernetes
  status:
    phase: Active
- apiVersion: batch/v1
  kind: CronJob
  metadata:
    annotations:
      autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"requests":{"cpu":"200m","memory":"1Gi"},"name":"dummy"}]},"output":{"containers":[{"limits":{"cpu":"250m","ephemeral-storage":"1Gi","memory":"1Gi"},"requests":{"cpu":"250m","ephemeral-storage":"1Gi","memory":"1Gi"},"name":"dummy"}]},"modified":true}'
      autopilot.gke.io/warden-version: 2.7.62
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"batch/v1","kind":"CronJob","metadata":{"annotations":{},"name":"run-dummy","namespace":"gke-val-fail"},"spec":{"concurrencyPolicy":"Forbid","jobTemplate":{"spec":{"template":{"metadata":{"name":"dummy"},"spec":{"containers":[{"image":"hello-world:latest","imagePullPolicy":"Always","name":"dummy","resources":{"requests":{"cpu":"200m","memory":"1024Mi"}}}],"restartPolicy":"Never"}}}},"schedule":"0 0 21 2 0"}}
    creationTimestamp: "2024-05-23T18:08:06Z"
    generation: 1
    name: run-dummy
    namespace: gke-val-fail
    resourceVersion: "69563413"
    uid: e889a193-01d1-4fb6-81a8-72d291c65aa8
  spec:
    concurrencyPolicy: Forbid
    failedJobsHistoryLimit: 1
    jobTemplate:
      metadata:
        creationTimestamp: null
      spec:
        template:
          metadata:
            creationTimestamp: null
            name: dummy
          spec:
            containers:
            - image: hello-world:latest
              imagePullPolicy: Always
              name: dummy
              resources:
                limits:
                  cpu: 250m
                  ephemeral-storage: 1Gi
                  memory: 1Gi
                requests:
                  cpu: 250m
                  ephemeral-storage: 1Gi
                  memory: 1Gi
              securityContext:
                capabilities:
                  drop:
                  - NET_RAW
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
            dnsPolicy: ClusterFirst
            restartPolicy: Never
            schedulerName: default-scheduler
            securityContext:
              seccompProfile:
                type: RuntimeDefault
            terminationGracePeriodSeconds: 30
            tolerations:
            - effect: NoSchedule
              key: kubernetes.io/arch
              operator: Equal
              value: amd64
    schedule: 0 0 21 2 0
    successfulJobsHistoryLimit: 3
    suspend: false
  status: {}
kind: List
metadata: {}
$ kubectl apply -f cronjob.yml -o yaml
Error from server (GKE Warden constraints violations): error when applying patch:
{"spec":{"jobTemplate":{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"dummy"}],"containers":[{"name":"dummy","resources":{"requests":{"cpu":"200m","memory":"1024Mi"}}}]}}}}}}
to:
Resource: "batch/v1, Resource=cronjobs", GroupVersionKind: "batch/v1, Kind=CronJob"
Name: "run-dummy", Namespace: "gke-val-fail"
for: "cronjob.yml": error when patching "cronjob.yml": admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: {"[denied by autogke-pod-limit-constraints]":["container 'dummy' does not have resource==limits which required in Autopilot clusters.","Total cpu requests for workload 'run-dummy' is lower than the Autopilot minimum required of '250m'."]}
Requested by user: '***REDACTED***', groups: 'system:authenticated'.
$ 
Top Labels in this Space