Description
What happened:
One of our nodes had a hard reboot. After this, the following message was occurring in the kubelet logs every 2 seconds:
Oct 07 12:46:32 k8s-master2-staging kubelet[7310]: E1007 12:46:32.359145 7310 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"1d4bfc07-3469-4eaa-992f-6d23c17f3aee\" found, but error not a directory occurred when trying to remove the volumes dir" numErrs=1
Indeed, the orphaned pod directory exists and contains 1 stale volume directory with a file in it (probably explaining the "not a directory" error):
sjors@k8s-master2-staging:~$ sudo ls -la /var/lib/kubelet/pods/1d4bfc07-3469-4eaa-992f-6d23c17f3aee/volumes/kubernetes.io~csi/pvc-13c81b28-4038-40d5-b6e8-4194e1d7be0e
total 12
drwxr-x--- 2 root root 4096 Oct 7 12:37 .
drwxr-x--- 3 root root 4096 Oct 2 19:37 ..
-rw-r--r-- 1 root root 270 Oct 7 12:37 vol_data.json
Deleting this file manually leads to immediate resolving of the Kubelet error 2 seconds later:
Oct 07 12:46:40 k8s-master2-staging kubelet[7310]: I1007 12:46:40.359957 7310 kubelet_volumes.go:160] "Cleaned up orphaned pod volumes dir" podUID=1d4bfc07-3469-4eaa-992f-6d23c17f3aee path="/var/lib/kubelet/pods/1d4bfc07-3469-4eaa-992f-6d23c17f3aee/volumes"
What you expected to happen:
I know that the Kubelet tries to clean up orphaned pod directories, and various issues have been fixed regarding this in the past, such as figuring out stale mounts and deleting old directories (#60987 for example).
However, it looks like when there are files present, it fails to remove the stale directory.
How to reproduce it (as minimally and precisely as possible):
I think, but have not attempted, it can be reproduced by these steps:
- start a Pod with a mounted PVC
- manually create a file inside a Pod's
volumes/kubernetes.io~csi/pvc-...
directory - hard-reboot the machine, so that the kubelet gets no chance to clean up the Pod directory, i.e. directory becomes orphaned
- (If creating the file of step 2 was not possible before, it can also be done after the reboot, but before the kubelet comes up.)
- observe error in the Kubelet logs.
Anything else we need to know?:
The CSI provider used is Hetzner Cloud (hcloud-csi-driver:1.6.0), I'm not sure if it's responsible for creating the file.
Environment:
- Kubernetes version (use
kubectl version
): 1.22.1 - Cloud provider or hardware configuration: bare-metal, 3 masters
- OS (e.g:
cat /etc/os-release
): Ubuntu 20.04.3 LTS - Kernel (e.g.
uname -a
): 5.4.0-88-generic - Install tools: kubeadm 1.22.1