Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl server side diff errors due to field type changes #94275

Closed
jpmsilva opened this issue Aug 27, 2020 · 35 comments
Closed

kubectl server side diff errors due to field type changes #94275

jpmsilva opened this issue Aug 27, 2020 · 35 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cli Categorizes an issue or PR as relevant to SIG CLI. wg/api-expression Categorizes an issue or PR as relevant to WG API Expression.

Comments

@jpmsilva
Copy link

What happened:
Spawned from #94121

Running kubectl diff server side errors out due to the internal data types being different.

# kubectl diff -f test.yaml --server-side=true
Error from server: failed to create typed patch object: errors:
  .spec.limits[0].default.cpu: expected string, got &value.valueUnstructured{Value:1}
  .spec.limits[0].defaultRequest.cpu: expected string, got &value.valueUnstructured{Value:0.5}

What you expected to happen:
No error should be shown, and the diff (if any) should be shown instead.

How to reproduce it (as minimally and precisely as possible):
Given the following test manifest:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limit-range
spec:
  limits:
  - default:
      cpu: 1
      memory: 2Gi
    defaultRequest:
      cpu: 0.5
      memory: 1Gi
    max:
      memory: 8Gi
    type: Container

Apply it to the cluster:

# kubectl apply -f test.yaml

Run a diff server side:

# kubectl diff -f test.yaml --server-side=true
Error from server: failed to create typed patch object: errors:
  .spec.limits[0].default.cpu: expected string, got &value.valueUnstructured{Value:1}
  .spec.limits[0].defaultRequest.cpu: expected string, got &value.valueUnstructured{Value:0.5}

Anything else we need to know?:
Running kubectl apply --dry-run=server -f ... always detects incoming changes, which might be a consequence of this issue (I can't tell for sure):

# kubectl apply -f test.yaml --dry-run=server
limitrange/default-limit-range configured (server dry run)

This issue is related to #94121, where this behavior was first observed.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:12:48Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:04:18Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: baremetal
  • OS (e.g: cat /etc/os-release):
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
  • Kernel (e.g. uname -a): Linux pluto-next 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

  • Network plugin and version (if this is a network-related bug):

  • Others:

@jpmsilva jpmsilva added the kind/bug Categorizes issue or PR as related to a bug. label Aug 27, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 27, 2020
@jpmsilva
Copy link
Author

/sig CLI

@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 27, 2020
@apelisse
Copy link
Member

/wg api-expression

@k8s-ci-robot k8s-ci-robot added the wg/api-expression Categorizes an issue or PR as relevant to WG API Expression. label Aug 27, 2020
@apelisse
Copy link
Member

Can you try to do: kubectl apply -f test.yaml --server-side=true and tell me what happens?

@jpmsilva
Copy link
Author

Hi @apelisse

Here's the output of the most complete set of tests I could think of (the one you asked is the last command execution), on a v1.18.8 clean cluster:

# kubectl delete -f test.yaml
Error from server (NotFound): error when deleting "test.yaml": limitranges "default-limit-range" not found
# kubectl apply -f test.yaml --server-side=true
Error from server: failed to create typed patch object: errors:
  .spec.limits[0].default.cpu: expected string, got &value.valueUnstructured{Value:1}
  .spec.limits[0].defaultRequest.cpu: expected string, got &value.valueUnstructured{Value:0.5}
# kubectl apply -f test.yaml
limitrange/default-limit-range created
# kubectl apply -f test.yaml --server-side=true
Error from server: failed to create typed patch object: errors:
  .spec.limits[0].default.cpu: expected string, got &value.valueUnstructured{Value:1}
  .spec.limits[0].defaultRequest.cpu: expected string, got &value.valueUnstructured{Value:0.5}

The error appears regardless of the deployment items already existing on the cluster, which makes me think that it is generated when parsing the local file to generate a patch, before anything even gets sent to the apiserver.

@apelisse
Copy link
Member

I was pretty sure we were handling int-or-string properly in SSA ... Can it be a problem with the limit-range? I'll look into it

FYI @lavalamp

@lavalamp
Copy link
Member

Yeah that's strange, did we break something with the optimizations? cc @jpbetz

@apelisse
Copy link
Member

Yeah, we should check if the schema looks like it should, and if it does, we certainly need a test if we don't have one ...

@jpmsilva
Copy link
Author

I recreated the cluster as version 1.19.0, and here are the results of the previous tests:

# kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:30:33Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:23:04Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
# cat test.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limit-range
spec:
  limits:
  - default:
      cpu: 1
      memory: 2Gi
    defaultRequest:
      cpu: 0.5
      memory: 1Gi
    max:
      memory: 8Gi
    type: Container
# kubectl delete -f test.yaml
Error from server (NotFound): error when deleting "test.yaml": limitranges "default-limit-range" not found

Ran as expected

# kubectl diff -f test.yaml --server-side=true
Error from server: failed to create typed patch object: errors:
  .spec.limits[0].default.cpu: expected string, got &value.valueUnstructured{Value:1}
  .spec.limits[0].defaultRequest.cpu: expected string, got &value.valueUnstructured{Value:0.5}

Ran with unexpected error

# kubectl apply -f test.yaml --dry-run=client
limitrange/default-limit-range created (dry run)

Ran as expected

# kubectl apply -f test.yaml --dry-run=server
limitrange/default-limit-range created (server dry run)

Ran as expected

# kubectl apply -f test.yaml --server-side=true
Error from server: failed to create typed patch object: errors:
  .spec.limits[0].default.cpu: expected string, got &value.valueUnstructured{Value:1}
  .spec.limits[0].defaultRequest.cpu: expected string, got &value.valueUnstructured{Value:0.5}

Ran with unexpected error

# kubectl apply -f test.yaml
limitrange/default-limit-range created

Ran as expected

# kubectl diff -f test.yaml --server-side=true
Error from server: failed to create typed patch object: errors:
  .spec.limits[0].default.cpu: expected string, got &value.valueUnstructured{Value:1}
  .spec.limits[0].defaultRequest.cpu: expected string, got &value.valueUnstructured{Value:0.5}

Ran with unexpected error

# kubectl apply -f test.yaml --dry-run=client
limitrange/default-limit-range configured (dry run)

Ran as expected

# kubectl apply -f test.yaml --dry-run=server
limitrange/default-limit-range configured (server dry run)

Ran showing unexpected changes (nothing changed compared to the server version)

# kubectl apply -f test.yaml --server-side=true
Error from server: failed to create typed patch object: errors:
  .spec.limits[0].default.cpu: expected string, got &value.valueUnstructured{Value:1}
  .spec.limits[0].defaultRequest.cpu: expected string, got &value.valueUnstructured{Value:0.5}

Ran with unexpected error

# kubectl apply -f test.yaml
limitrange/default-limit-range configured

Ran with unexpected changes applied (nothing changed compared to the server version)

So, all in all, I think nothing changed compared to version 1.18.8.

The comments I have for each execution are what I believe to be the expected outcome, but I might be mistaken for some cases (like client side vs server side changes). In that case, feel free to correct me.

Hope this helps.

@knight42
Copy link
Member

@jpmsilva As a workaround, you could change cpu: 1 to cpu: "1" and cpu: 0.5 to cpu: "0.5", as theoretically its type should be string.

@knight42
Copy link
Member

As per https://github.com/kubernetes/kubernetes/blob/master/api/openapi-spec/swagger.json:

    "io.k8s.apimachinery.pkg.util.intstr.IntOrString": {
      "description": "IntOrString is a type that can hold an int32 or a string.  When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type.  This allows you to have, for example, a JSON field that can accept a name or number.",
      "format": "int-or-string",
      "type": "string"
    },
    "io.k8s.apimachinery.pkg.api.resource.Quantity": {
      "description": "<omitted>",
      "type": "string"
    },

Perhaps we should set the OpenAPISchemaFormat of io.k8s.apimachinery.pkg.api.resource.Quantity to int-or-string as well.

@apelisse
Copy link
Member

Yeah probably, but that doesn't solve the bug about quantity being canonicalized by the apiserver (and forcing possible etcd rewrites).

@knight42
Copy link
Member

I am not sure if this is a bug or an expected behavior.

@apelisse
Copy link
Member

I am not sure if this is a bug or an expected behavior.

What specifically? If you write 0.5 and it's converted to "0.5", "Fieldmanager" detects a change (because at the time we detect managed fields, we don't know yet that 0.5 and "0.5" are the same thing), update the managedFields entry, and forces a rewrite in etcd. That's clearly not expected :-/

As a workaround, you could change cpu: 1 to cpu: "1" and cpu: 0.5 to cpu: "0.5", as theoretically its type should be string.

That's a good short-term solution.

@knight42
Copy link
Member

What specifically?

I mean accepting integer value as quantity and then converting it to string at server side.

Now that the value would eventually be converted to string and I am not quite familiar with the code history, if this is a workaround for some previous problems, I think we should stick to string. If not, I think we could set the OpenAPISchemaFormat of io.k8s.apimachinery.pkg.api.resource.Quantity to int-or-string, so that the integer value could also pass the validation.

@knight42
Copy link
Member

though I think I am not experienced or competent to decide what to do :/

@apelisse
Copy link
Member

We must continue to accept the number, that's clearly a bug.

@jpmsilva
Copy link
Author

Hi @knight42 and @apelisse
Just to confirm, I added the quotes around the cpu fields and indeed the error "expected string, got &value.valueUnstructured" does not manifest anymore.

If I run apply and then diff (both on client side or both on server side), only the time field shows as a difference, which is the separate issue #94121.

However, if I apply client side and then diff server side, an error occurs:

#  kubectl delete -f test.yaml
limitrange "default-limit-range" deleted
# kubectl apply -f test.yaml
limitrange/default-limit-range created
# kubectl diff -f test.yaml --server-side=true
W0830 13:09:21.145399 1335824 diff.go:528] Object (/v1, Kind=LimitRange: default-limit-range) keeps changing, diffing without lock
Error from server (Conflict): Apply failed with 1 conflict: conflict with "kubectl-client-side-apply" using v1: .spec.limits

The value kubectl-client-side-apply mentioned on the error message appears to be the value from the manager field added automatically by the system:

# kubectl get LimitRange default-limit-range -o yaml
apiVersion: v1
kind: LimitRange
### snip ###
    manager: kubectl-client-side-apply

I did look and when the apply is run client side the manager field contains the value kubectl. And if I reverse the combination (client side apply, server side diff), the diff does not throw an error.

Do you think this is an issue, and if so, would it warrant a separate issue? For me is not at all critical, as I am not planning on mixing client side and server side operations on the configuration management formulas.

@apelisse
Copy link
Member

That's expected, especially if you don't have 1.19. We've simplified the experience in 1.19 so that client-side and server-side don't conflict with each other.

@jpmsilva
Copy link
Author

For whatever it’s worth (and anyone that might encounter a similar error message), the execution I showed was with version 1.19.0.

@apelisse
Copy link
Member

@julianvmodesto Would that be expected?

@julianvmodesto
Copy link
Contributor

Hm that's not expected... let me check the tests for this.

@nodo
Copy link
Contributor

nodo commented Sep 17, 2020

/assign

@nodo
Copy link
Contributor

nodo commented Sep 18, 2020

@apelisse I tried to run again the steps to reproduce the bug with the change in #94815, here is the result:

$ kubectl --context kind-repro diff -f ~/test.yml --server-side=true --force-conflicts=true

diff -u -N /var/folders/vm/ghtbxsmj3z108zhlzj44873r0000gp/T/LIVE-796717667/v1.LimitRange.default.default-limit-range /var/folders/vm/ghtbxsmj3z108zhlzj44873r0000gp/T/MERGED-436581478/v1.LimitRange.default.default-limit-range
--- /var/folders/vm/ghtbxsmj3z108zhlzj44873r0000gp/T/LIVE-796717667/v1.LimitRange.default.default-limit-range   2020-09-18 15:02:15.918245465 +0200
+++ /var/folders/vm/ghtbxsmj3z108zhlzj44873r0000gp/T/MERGED-436581478/v1.LimitRange.default.default-limit-range 2020-09-18 15:02:15.924263300 +0200
@@ -3,18 +3,24 @@
 metadata:
   annotations:
     kubectl.kubernetes.io/last-applied-configuration: |
-      {"apiVersion":"v1","kind":"LimitRange","metadata":{"annotations":{},"name":"default-limit-range","namespace":"default"},"spec":{"limits":[{"default":{"cpu":1,"memory":"2Gi"},"defaultRequest":{"cpu":0.5,"memory":"1Gi"},"max":{"memory":"8Gi"},"type":"Container"}]}}
+      {"apiVersion":"v1","kind":"LimitRange","metadata":{"name":"default-limit-range","namespace":"default"},"spec":{"limits":[{"default":{"cpu":1,"memory":"2Gi"},"defaultRequest":{"cpu":0.5,"memory":"1Gi"},"max":{"memory":"8Gi"},"type":"Container"}]}}
   creationTimestamp: "2020-09-18T12:57:57Z"
   managedFields:
   - apiVersion: v1
     fieldsType: FieldsV1
     fieldsV1:
+      f:spec:
+        f:limits: {}
+    manager: kubectl
+    operation: Apply
+    time: "2020-09-18T13:02:15Z"
+  - apiVersion: v1
+    fieldsType: FieldsV1
+    fieldsV1:
       f:metadata:
         f:annotations:
           .: {}
           f:kubectl.kubernetes.io/last-applied-configuration: {}
-      f:spec:
-        f:limits: {}
     manager: kubectl-client-side-apply
     operation: Update
     time: "2020-09-18T12:57:57Z"

Without --force-conflicts=true it returns an error:

W0918 15:08:19.205032   11839 diff.go:528] Object (/v1, Kind=LimitRange: default-limit-range) keeps changing, diffing without lock
Error from server (Conflict): Apply failed with 1 conflict: conflict with "kubectl-client-side-apply" using v1: .spec.limits

I haven't started investigating yet and I don't have much context on this topic, but it makes sense to me.

@apelisse
Copy link
Member

apelisse commented Sep 18, 2020

We only ignore field transfer between SSA and CSA when the values haven't changed, but they do here because of the canonicalization of quantity. This is a very hard bug to fix, not sure we'll ever be able to solve it. Same as before, apply the canonicalized form to work around the bug.

@nodo
Copy link
Contributor

nodo commented Sep 22, 2020

I am not sure I get it, how does the canonicalization of quantity affects fields? Intuitively it should affect only the value, right?

@apelisse
Copy link
Member

I'm not sure I understand your question. How does it affect fields?

I agree it only affects the value, but because we don't know these values are equal, we infer that the user changed the field value, and so we move the field ownership around, causing possible conflicts and extra-diff output.

@nodo
Copy link
Contributor

nodo commented Sep 22, 2020

Right! now it's clear, thanks!

@nodo
Copy link
Contributor

nodo commented Dec 6, 2020

/unassign

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 5, 2021
@brycecr
Copy link

brycecr commented Apr 13, 2021

/remove-lifecycle rotten

Was this fixed by #94815 ? We're downstream but have started to run into this issue, so just wondering if a fix is coming down the pipe.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 13, 2021
@apelisse
Copy link
Member

Well, it'd be fixed by #94815 but we won't be merging that. We need a better answer. I have some ideas related to that, but nothing easy.

@brycecr
Copy link

brycecr commented Apr 14, 2021

Anecdotally, I think we started seeing something similar on HorizontalPodAutoscalers, specifically fields like metrics[*].target.averageValue. Is that the same issue?

@apelisse
Copy link
Member

Yeah, they are using Quantity. Actually, this has been fixed in kubernetes/kube-openapi#221 but I think we completely forgot to integrate this in kubernetes :-( Doing that now.

@apelisse
Copy link
Member

Actually, I was wrong, it's been merged in kubernetes as part of #99817. This is fixed in 1.22.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cli Categorizes an issue or PR as relevant to SIG CLI. wg/api-expression Categorizes an issue or PR as relevant to WG API Expression.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants