Skip to content

Orphaned pod found, but error not a directory occurred when trying to remove the volumes dir #105536

Open
@sgielen

Description

@sgielen

What happened:

One of our nodes had a hard reboot. After this, the following message was occurring in the kubelet logs every 2 seconds:

Oct 07 12:46:32 k8s-master2-staging kubelet[7310]: E1007 12:46:32.359145    7310 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"1d4bfc07-3469-4eaa-992f-6d23c17f3aee\" found, but error not a directory occurred when trying to remove the volumes dir" numErrs=1

Indeed, the orphaned pod directory exists and contains 1 stale volume directory with a file in it (probably explaining the "not a directory" error):

sjors@k8s-master2-staging:~$ sudo ls -la /var/lib/kubelet/pods/1d4bfc07-3469-4eaa-992f-6d23c17f3aee/volumes/kubernetes.io~csi/pvc-13c81b28-4038-40d5-b6e8-4194e1d7be0e
total 12
drwxr-x--- 2 root root 4096 Oct  7 12:37 .
drwxr-x--- 3 root root 4096 Oct  2 19:37 ..
-rw-r--r-- 1 root root  270 Oct  7 12:37 vol_data.json

Deleting this file manually leads to immediate resolving of the Kubelet error 2 seconds later:

Oct 07 12:46:40 k8s-master2-staging kubelet[7310]: I1007 12:46:40.359957    7310 kubelet_volumes.go:160] "Cleaned up orphaned pod volumes dir" podUID=1d4bfc07-3469-4eaa-992f-6d23c17f3aee path="/var/lib/kubelet/pods/1d4bfc07-3469-4eaa-992f-6d23c17f3aee/volumes"

What you expected to happen:

I know that the Kubelet tries to clean up orphaned pod directories, and various issues have been fixed regarding this in the past, such as figuring out stale mounts and deleting old directories (#60987 for example).

However, it looks like when there are files present, it fails to remove the stale directory.

How to reproduce it (as minimally and precisely as possible):

I think, but have not attempted, it can be reproduced by these steps:

  1. start a Pod with a mounted PVC
  2. manually create a file inside a Pod's volumes/kubernetes.io~csi/pvc-... directory
  3. hard-reboot the machine, so that the kubelet gets no chance to clean up the Pod directory, i.e. directory becomes orphaned
  4. (If creating the file of step 2 was not possible before, it can also be done after the reboot, but before the kubelet comes up.)
  5. observe error in the Kubelet logs.

Anything else we need to know?:

The CSI provider used is Hetzner Cloud (hcloud-csi-driver:1.6.0), I'm not sure if it's responsible for creating the file.

Environment:

  • Kubernetes version (use kubectl version): 1.22.1
  • Cloud provider or hardware configuration: bare-metal, 3 masters
  • OS (e.g: cat /etc/os-release): Ubuntu 20.04.3 LTS
  • Kernel (e.g. uname -a): 5.4.0-88-generic
  • Install tools: kubeadm 1.22.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/storageCategorizes an issue or PR as relevant to SIG Storage.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions