Skip to content

In case of node reboot, pod running on that node goes to "Unknown" state, as kubelet fails to attach the PVC associated #119401

Closed
@akankshapanse

Description

@akankshapanse

What happened?

Given a Kubernetes cluster with Windows worker nodes and a vSphere CSI driver installed, when a worker node is rebooted, pod, running on the restarting node, goes to "Unknown" state from a "Running" state, and remains in same "Unknown" state forever.
The error seen in corresponding pod description for both of the above cases is as shown below:

  Warning  FailedMount  9m11s (x2131 over 3d)  kubelet  MountVolume.MountDevice failed for volume "pvc-X-X-X-X-X" : kubernetes.io/csi: attacher.MountDevice failed to create dir "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\csi.vsphere.vmware.com\\XXXXX\\globalmount":  mkdir \var\lib\kubelet\plugins\kubernetes.io\csi\csi.vsphere.vmware.com\XXXXX\globalmount: Cannot create a file when that file already exists.
  Warning  FailedMount  3m42s (x1495 over 3d)  kubelet  Unable to attach or mount volumes: unmounted volumes=[<abc>], unattached volumes=[<abc> kube-api-access-8dqhl]: timed out waiting for the condition

If the pod in this state is deleted forcefully and gets rescheduled on same node, it does not enter "Running" state but remains in "ContainerCreating" state forever with same error in pod description as above case.

The error above states that during MountDevice() in csi_attacher.go, kubelet fails to create staging target path directory as the directory already exists.

Before the change done by PR #88759, kubelet was checking if staging target directory is present and mounted already, but the check was removed considering that mount point related checks should be done at CSI driver level, not generally at kubelet level.

However, in cases of node shutdown/reboot, unmount and removal of the staging target path directory may not be possible or may fail for various reasons, which would leave the staging directory created on that node as it is. So, when the pod gets re-scheduled on that same node, and kubelet tries to create the directory again, it fails at

if err = os.MkdirAll(deviceMountPath, 0750); err != nil {
.

Similar code exists in SetUpAt() in csi_mounter.go (

if err := os.MkdirAll(parentDir, 0750); err != nil {
), that can also lead to same issue before NodePublishVolume().

The issue was seen originally in context of Windows worker node. However, the kubelet code is common regardless of OS flavors. Thus, the issue can be seen in case of Linux worker node as well.

What did you expect to happen?

kubelet should handle node reboot/restart like cases, where the staging target path directory for the PVC can be already present on the node and let the corresponding CSI driver handle the further processing on that target directory in NodeStageVolume() and NodePublishVolume(), similar to earlier commit done for another issue (Reference: https://github.com/kubernetes/kubernetes/pull/88569/files#diff-227f84916ffb93ece42ccaec840af8ea265714440c15c45f42b08d6a427a57bfR319).

How can we reproduce it (as minimally and precisely as possible)?

Node reboot or restart or poweroff-on scenario to be executed

Anything else we need to know?

No response

Kubernetes version

v1.24.7

Cloud provider

On-premise datacenter

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

CSI plugin: vSphere CSI driver with latest source code

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.sig/storageCategorizes an issue or PR as relevant to SIG Storage.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions