Description
What happened?
Given a Kubernetes cluster with Windows worker nodes and a vSphere CSI driver installed, when a worker node is rebooted, pod, running on the restarting node, goes to "Unknown" state from a "Running" state, and remains in same "Unknown" state forever.
The error seen in corresponding pod description for both of the above cases is as shown below:
Warning FailedMount 9m11s (x2131 over 3d) kubelet MountVolume.MountDevice failed for volume "pvc-X-X-X-X-X" : kubernetes.io/csi: attacher.MountDevice failed to create dir "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\csi.vsphere.vmware.com\\XXXXX\\globalmount": mkdir \var\lib\kubelet\plugins\kubernetes.io\csi\csi.vsphere.vmware.com\XXXXX\globalmount: Cannot create a file when that file already exists.
Warning FailedMount 3m42s (x1495 over 3d) kubelet Unable to attach or mount volumes: unmounted volumes=[<abc>], unattached volumes=[<abc> kube-api-access-8dqhl]: timed out waiting for the condition
If the pod in this state is deleted forcefully and gets rescheduled on same node, it does not enter "Running" state but remains in "ContainerCreating" state forever with same error in pod description as above case.
The error above states that during MountDevice()
in csi_attacher.go
, kubelet fails to create staging target path directory as the directory already exists.
Before the change done by PR #88759, kubelet was checking if staging target directory is present and mounted already, but the check was removed considering that mount point related checks should be done at CSI driver level, not generally at kubelet level.
However, in cases of node shutdown/reboot, unmount and removal of the staging target path directory may not be possible or may fail for various reasons, which would leave the staging directory created on that node as it is. So, when the pod gets re-scheduled on that same node, and kubelet tries to create the directory again, it fails at
kubernetes/pkg/volume/csi/csi_attacher.go
Line 342 in a8b90c9
Similar code exists in SetUpAt()
in csi_mounter.go
(
kubernetes/pkg/volume/csi/csi_mounter.go
Line 211 in a8b90c9
The issue was seen originally in context of Windows worker node. However, the kubelet code is common regardless of OS flavors. Thus, the issue can be seen in case of Linux worker node as well.
What did you expect to happen?
kubelet should handle node reboot/restart like cases, where the staging target path directory for the PVC can be already present on the node and let the corresponding CSI driver handle the further processing on that target directory in NodeStageVolume()
and NodePublishVolume()
, similar to earlier commit done for another issue (Reference: https://github.com/kubernetes/kubernetes/pull/88569/files#diff-227f84916ffb93ece42ccaec840af8ea265714440c15c45f42b08d6a427a57bfR319).
How can we reproduce it (as minimally and precisely as possible)?
Node reboot or restart or poweroff-on scenario to be executed
Anything else we need to know?
No response
Kubernetes version
v1.24.7
Cloud provider
On-premise datacenter
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
CSI plugin: vSphere CSI driver with latest source code