Solved: Autopilot node now stuck in Ready,SchedulingDisabl...

ashak · 01-31-2023 12:04 AM

Hi,

I upgraded a small autopilot cluster yesterday to v1.25.4-gke.2100. This seems to have upgraded the control plane part of the cluster but not the nodes themselves which remain on v1.24.7-gke.900.

There doesn't seem to be a way to manually trigger them to update, so I guess a just wait?

Anyhow, as part of performing the upgrade this fired up a new node meaning that my cluster now has 3 nodes (for the last 6 months my system has happily sat there running on just two nodes). I took a look at all of the running pods and removed anything that seemed to be a daemon set from the list. On this new node, that simply left a kube-dns pod.

I wondered whether I could somehow trigger this to move back onto one of the other two nodes, so without really thinking about it I ran kubectl drain --ignore-daemonsets against the node as if this was a non autopilot cluster. This cordoned the node (as I would have expected before I remembered it was an autopilot cluster) but then failed to evict the kube-dns pod because the kube-system namespace is managed in an autopilot cluster meaning that my request can't create a pods/eviction resource there.

Ok... I can simply uncordon the node and ponder a bit more. Only, I can't uncordon the node because 'denied by autogke-no-node-updates: Uncordon on nodes is not allowed in Autopilot.'

So now I still have 3 nodes in my cluster when i'm pretty confident it will run ok on just 2. And the node i'm trying to get rid of now has SchedulingDisabled and I can't re-enable it.

Should I be able to cordon an autopilot node in the first place?

Why isn't the node auto scaling getting rid of this third node altogether? When deploying my application it will often create a third node, put the new instance of the application onto that new node and terminate the original instance... then after a short while I assume it realises that it can consolidate that pod back onto the original two nodes and repeats the process to make that happen. At which point the new node then gets shutdown. Why isn't that happening here for the kube-dns pod?

Thanks

garisingh

I’m not sure the exact amount of time it will take for your nodes to upgrade. It can vary depending on what we may also be rolling out.

in terms of cost, you don’t pay for the nodes. You only pay for the resources used by your deployments. So even if the node with kube-dns hangs around, you are not paying for it.

View solution in original post

garisingh

1) Autopilot will eventually automatically upgrade the nodes to match the control plane version. As you mentioned, you can't trigger a node upgrade yourself.

2) We just recently introduced support for "kubectl drain" ( https://cloud.google.com/kubernetes-engine/docs/troubleshooting/troubleshooting-autopilot-clusters#u... ) on Autopilot. At least one instance of "kube-dns" needs to be running in the cluster, so at this point, you'll likely just need to wait for Autopilot to cleanup / recreate the node.

Is there an issue with the node staying around?

ashak

@garisingh thanks for taking some time to get back to me

In relation to the node upgrade, what sort of time period does this happen over? It must be at least 20 hours since I upgraded the control plane, that feels like something is broken, but perhaps this is just the expected behaviour?

Support for kubectl drain is interesting and obviously explains why I could run it and cordon the node. I can see how it may be useful on nodes running everything under my control which is why I attempted it in the first place. But in my current situation it feels like it's not quite enough.

Since I can't use it to evict the kube-dns pod what happens now? Assuming the kube-dns pod doesn't die of its own accord and end up being scheduled elsewhere, will this node simply continue to hang around until I next update the control plane (assuming that's the only time a new kube-dns pod will possibly come into existence) or the nodes are upgraded?

With my own app it's super efficient at consolidating it onto the smallest set of nodes. Within a couple of mins of it starting a new node to run the new instance of my app, it's then evicted the new pod after recreating it again on the busier nodes and shut the new node down again (at least I assume that's what's happening since the same happens almost every time I update my application and rollout the new container). The same doesn't seem to apply to kube-dns, or at least that's not what i'm seeing happen.

I guess the only issue is potential cost? I'm sure when the cluster first scaled itself back to just two instances that there was a drop in cost of compute. I can't confirm though as my billing hasn't updated to include yesterday yet (where I would expect to see a rise in my compute cost since the extra node now exists). It's highly possible i'm wrong though, I struggled to understand the documentation and the pricing model for this stuff when I first looked at it. Perhaps this is not an issue.

Thanks

garisingh

I’m not sure the exact amount of time it will take for your nodes to upgrade. It can vary depending on what we may also be rolling out.

in terms of cost, you don’t pay for the nodes. You only pay for the resources used by your deployments. So even if the node with kube-dns hangs around, you are not paying for it.

ashak

Thanks for the further confirmation @garisingh

For anyone else that may read this in the future it has taken almost 3.5 days from upgrading my control plane for my nodes to begin upgrading.

You're also correct in relation to costs, sorry for my mistake, I must have performed some other operation previously that also removed some pods or something and mistakenly thought that the additional node that was running at the time was causing some of the cost that went away

Thanks for the help

arrase

Same problem here, a gke autopilot node with v1.27.8-gke.1067004 version is stuck for 44 days in Ready,SchedulingDisabled state after a node drain. The other nodes in the cluster are at version v1.29.1-gke.1589018.

Autopilot node now stuck in Ready,SchedulingDisabled state