-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ignore mount dir check in csi node stage/publish #88759
Conversation
/test pull-kubernetes-e2e-kind |
/test pull-kubernetes-e2e-kind-ipv6 |
@@ -217,12 +201,6 @@ func (c *csiMountMgr) SetUpAt(dir string, mounterArgs volume.MounterArgs) error | |||
return fmt.Errorf("volume source not found in volume.Spec") | |||
} | |||
|
|||
// create target_dir before call to NodePublish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the patch, @andyzhangx. I noticed that we used to call os.MkdirAll
here, but now we won't. Can you clarify whether the CSI plugin should still find that this target directory already exists, due to it being created earlier, or are we transferring new responsibility to drivers to create this target directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In my driver, and those that I've studied, we create the target path in the NodePublishVolume
handler if the directory doesn't already exist, so I wouldn't mind this change in behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSI says:
// The CO SHALL ensure that the parent directory of this path exists
// and that the process serving the request hasread
andwrite
// permissions to that parent directory.
...
// Creation of target_path is the responsibility of the SP.
Should kubelet MkdirAll()
the parent then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had similar thought while getting the azuredisk csi driver to work in windows. But there are other drivers (eg. https://github.com/kubernetes-sigs/secrets-store-csi-driver/blob/master/pkg/secrets-store) which assumes the directory existance. Hence choose to work around this issue in the windows azure disk csi. Work around is not clean - it has to delete the target directory so that a link can be created, but it works.
My recommendation is that we hold off on - the change to not creating target, until 1.18 release rolls out. Make the change early in 1.19 and inform folks via sig storage channels - slack, mailing list , meetings about this giving opportunity to work the changes out by the time 1.19 comes out. Thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. How about we at least make a deprecation notice this release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we mainly concerned with removal of mount path creation?
Do we want to proceed with removal of mount point check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. How about we at least make a deprecation notice this release?
Yes, SGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should put in all three notes in deprecation notice
- Create if not present of the mount directory,
- Check whether its mounted
- Corruption check
all getting deprecated in the release note and proceed with the removal at the beginning of the 1.19 cycle. Although I don’t know of which drivers would break in the (2) and (3), it looks very easy possibility to break things late the cycle with this.
/milestone v1.18 |
/test pull-kubernetes-e2e-gce-csi-serial |
/test pull-kubernetes-e2e-kind-ipv6 |
/test pull-kubernetes-e2e-gce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like removing the mount check, however, I am not so sure about removing Mkdir
s - it needs to be done very carefully.
pkg/volume/csi/csi_attacher.go
Outdated
dataDir := filepath.Dir(deviceMountPath) | ||
if err = os.MkdirAll(dataDir, 0750); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deviceMountPath
is used as NodeStageVolumeRequest.staging_target_path
. CSI says:
// The CO SHALL ensure
// that the path is directory and that the process serving the
// request hasread
andwrite
permission to that directory. The
// CO SHALL be responsible for creating the directory if it does not
// exist.
string staging_target_path = 3;
So, Kubernetes should MkDir
whole deviceMountPath
. Or CSI needs to be fixed. Or worked around, as we do with block volumes - CSI drivers stage block devices as files inside staging_target_path
directory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these addressed yet?
pkg/volume/csi/csi_attacher.go
Outdated
@@ -315,7 +298,7 @@ func (c *csiAttacher) MountDevice(spec *volume.Spec, devicePath string, deviceMo | |||
if err != nil && volumetypes.IsOperationFinishedError(err) { | |||
// clean up metadata | |||
klog.Errorf(log("attacher.MountDevice failed: %v", err)) | |||
if err := removeMountDir(c.plugin, deviceMountPath); err != nil { | |||
if err := removeMountDir(c.plugin, dataDir); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, whole deviceMountPath should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be addressed still..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was fixed.
@@ -217,12 +201,6 @@ func (c *csiMountMgr) SetUpAt(dir string, mounterArgs volume.MounterArgs) error | |||
return fmt.Errorf("volume source not found in volume.Spec") | |||
} | |||
|
|||
// create target_dir before call to NodePublish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSI says:
// The CO SHALL ensure that the parent directory of this path exists
// and that the process serving the request hasread
andwrite
// permissions to that parent directory.
...
// Creation of target_path is the responsibility of the SP.
Should kubelet MkdirAll()
the parent then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane addressed your comments, PTAL, thanks.
@@ -217,12 +201,6 @@ func (c *csiMountMgr) SetUpAt(dir string, mounterArgs volume.MounterArgs) error | |||
return fmt.Errorf("volume source not found in volume.Spec") | |||
} | |||
|
|||
// create target_dir before call to NodePublish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
/test pull-kubernetes-e2e-gce-storage-snapshot |
Hello @andyzhangx, @jsafrane! |
/hold cancel |
pkg/volume/csi/csi_mounter.go
Outdated
@@ -218,8 +202,13 @@ func (c *csiMountMgr) SetUpAt(dir string, mounterArgs volume.MounterArgs) error | |||
} | |||
|
|||
// create target_dir before call to NodePublish | |||
if err := os.MkdirAll(dir, 0750); err != nil && !corruptedDir { | |||
return errors.New(log("mounter.SetUpAt failed to create dir %#v: %v", dir, err)) | |||
if err := os.MkdirAll(dir, 0750); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should Mkdir the parent(dir) to align with the csi spec wording. I believe then we don't need the corruptedDir check because the parent dir is out of the driver's control
Added three commits fixing remaining issues with this PR:
|
And tested a bit on AWS. |
@@ -568,7 +568,7 @@ func (og *operationGenerator) GenerateMountVolumeFunc( | |||
DevicePath: devicePath, | |||
} | |||
|
|||
if volumeDeviceMounter != nil { | |||
if volumeDeviceMounter != nil && actualStateOfWorld.GetDeviceMountState(volumeToMount.VolumeName) != DeviceGloballyMounted { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add unit tests for these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operation_generator.go does not have a big unit test coverage, I'll try to test it manually with a mock driver + some error injection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane we have azure file csi driver that has implemented NodeStageVolume
, and for this code change, it only call one NodeStageVolume
, it would not work if that volume is going to mount on multiple nodes, shall we revert this code change, even NodeStageVolume
is mounted multiple times on same node, CSI driver should ensure mount on same target work, right? find details here: kubernetes-sigs/azurefile-csi-driver#620
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand. The code above makes sure that NodeStageVolume
should be called only once on each node where a shared volume is used. The volume can still be used in multiple nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what I observe, there is a little possibility that NodeStageVolume
is not invoked, only invoke NodePublishVolume
, and it only happens from k8s v1.20.x, do you know what could be the root cause? thanks.
I0502 14:51:38.501550 1 utils.go:118] GRPC call: /csi.v1.Node/NodePublishVolume
I0502 14:51:38.501574 1 utils.go:119] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/globalmount","target_path":"/var/lib/kubelet/pods/ed800670-69de-4a1e-902c-e0ceeca4ec68/volumes/kubernetes.io~csi/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":1}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"azurefile-volume-tester-2tpth","csi.storage.k8s.io/pod.namespace":"azurefile-4885","csi.storage.k8s.io/pod.uid":"ed800670-69de-4a1e-902c-e0ceeca4ec68","csi.storage.k8s.io/serviceAccount.name":"default"},"volume_id":"kubetest-u33fnt21#f7bf7d45cebcf4286987d85#pre-provisioned-multiple-pods#"}
I0502 14:51:38.501732 1 nodeserver.go:84] NodePublishVolume: mounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/globalmount at /var/lib/kubelet/pods/ed800670-69de-4a1e-902c-e0ceeca4ec68/volumes/kubernetes.io~csi/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/mount with mountOptions: [bind]
I0502 14:51:38.501750 1 mount_linux.go:175] Mounting cmd (mount) with arguments ( -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/globalmount /var/lib/kubelet/pods/ed800670-69de-4a1e-902c-e0ceeca4ec68/volumes/kubernetes.io~csi/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/mount)
E0502 14:51:38.503747 1 mount_linux.go:179] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/globalmount /var/lib/kubelet/pods/ed800670-69de-4a1e-902c-e0ceeca4ec68/volumes/kubernetes.io~csi/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/mount
Output: mount: /var/lib/kubelet/pods/ed800670-69de-4a1e-902c-e0ceeca4ec68/volumes/kubernetes.io~csi/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/mount: special device /var/lib/kubelet/plugins/kubernetes.io/csi/pv/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/globalmount does not exist.
E0502 14:51:38.503833 1 utils.go:123] GRPC error: rpc error: code = Internal desc = Could not mount "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/globalmount" at "/var/lib/kubelet/pods/ed800670-69de-4a1e-902c-e0ceeca4ec68/volumes/kubernetes.io~csi/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/mount": mount failed: exit status 32
Mounting command: mount
Mounting arguments: -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/globalmount /var/lib/kubelet/pods/ed800670-69de-4a1e-902c-e0ceeca4ec68/volumes/kubernetes.io~csi/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/mount
Output: mount: /var/lib/kubelet/pods/ed800670-69de-4a1e-902c-e0ceeca4ec68/volumes/kubernetes.io~csi/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/mount: special device /var/lib/kubelet/plugins/kubernetes.io/csi/pv/azurefile-4885-file.csi.azure.com-preprovsioned-pv-ql6t7/globalmount does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeUnstageVolume
takes some time(delay), and for next pod, it could be still inDeviceGloballyMounted
, and then skipNodeStageVolume
and after a while, the formerNodeUnstageVolume
completed, so for next pod, it only hasNodePublishVolume
func, is that possible @jsafrane
It should not be possible. Kubelet should run only a single operator per volume (MountVolume or UnmountVolume). The aforementioned behavior would be a bug, if it really happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This affects the following scenario, can you please share your thoughts:
If there is a pod(let's say part of a StatefulSet) running on a node which uses a volume and the pod is deleted, the NodeUnstageVolume
RPC gets called.
If completion of this NodeUnstageVolume
RPC call takes time(for example: due to a wait on completion of umount
Linux command), for the new replacement pod(scheduled on the same node) kubelet will find the volume as DeviceGloballyMounted
and call NodePublishVolume
RPC without calling NodeStageVolume
RPC.
Later the NodeUnstageVolume
RPC's unmount operation which was in progress would complete and unmount the volume. Thus, effectively for the new pod the staging doesn't happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohitChaware did you hit such race condition issue recently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @andyzhangx , reproduced the above scenario on K8s version 1.20.6.
NodeStageVolume
RPC isn't getting called in case when NodeUnstageVolume
operation takes longer to complete(more than 2 minutes) and a new pod using the same volume is scheduled on the same node.
The flow of RPC calls go as expected for versions < 1.20.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#100183 may help
@@ -218,10 +202,11 @@ func (c *csiMountMgr) SetUpAt(dir string, mounterArgs volume.MounterArgs) error | |||
} | |||
|
|||
// create target_dir before call to NodePublish | |||
if err := os.MkdirAll(dir, 0750); err != nil && !corruptedDir { | |||
return errors.New(log("mounter.SetUpAt failed to create dir %#v: %v", dir, err)) | |||
parentDir := filepath.Dir(dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a unit test to make sure we're creating the right directories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
CSI says about NodeStage (=MountDevice): // The CO SHALL ensure [...] // that the path is directory and that the process serving the // request has `read` and `write` permission to that directory. The // CO SHALL be responsible for creating the directory if it does not // exist.
When NodeExpand fails between MountDevice and SetUp, make sure the next NodeExpand attempt is called before SetUp.
Cann MountDevice only when the volume was not device-mounted before.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR breaks gce pd driver test for Windows due to a bug in the driver side. |
We discussed this in the csi meeting. We already put out a deprecation notice with 1.18, and the sanity test in csi-test has not been creating the parent directory for almost a year now, so plugins should have had enough time to fix their drivers. We'll send out an email reminder to storage/csi lists to warn driver authors that this change is coming, and they can use csi-test to validate their driver can handle this properly. For Windows testing, I think we need to get the CI in a good enough state that we can run windows csi tests as a presubmit to catch problems like this earlier. And also look into how to support csi sanity on Windows. |
I catch this issue through windows ci tests for pd driver. A problem is we cannot run csi driver presubmit test in k/k changes. CI tests can help to catch issues afterwards. |
We have some csi drivers defined in k/k e2es: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/drivers/csi.go And we have an optional presubmit job to run it whenever a volume file is changed: pull-kubernetes-e2e-gce-csi-serial If we can stabilize the windows presubmit, I think we can add a similar job. |
got it, thanks! will check and add it for Windows. |
*Added checks if already mounted in NodeStageVolume and NodePublsihVolume *this fixes an issue that happens after a change in kubelet 1.20 (kubernetes/kubernetes#88759) *the issue is that since kubelet 1.20+ does not handle mounts or mount checks anymore, The CSI driver would bind-mount twice on the same source and target which will cause duplicate bind-mounts. subsequently when Unpublish is called, the driver remove only a single bind-mount but others may still keep pointing to the block device which will cause remove_dir() call to wipe the data from the volume. Change-Id: I89b30a3a65d2a17291e0597f34118f915264edf6 Signed-off-by: Gil <[email protected]>
*Added checks if already mounted in NodeStageVolume and NodePublsihVolume *this fixes an issue that happens after a change in kubelet 1.20 (kubernetes/kubernetes#88759) *the issue is that since kubelet 1.20+ does not handle mounts or mount checks anymore, The CSI driver would bind-mount twice on the same source and target which will cause duplicate bind-mounts. subsequently when Unpublish is called, the driver remove only a single bind-mount but others may still keep pointing to the block device which will cause remove_dir() call to wipe the data from the volume. Change-Id: I89b30a3a65d2a17291e0597f34118f915264edf6 Signed-off-by: Gil <[email protected]>
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR ignores dir check in csi node stage/publish, as a proceeding fix as PR(#88569), csi driver should check target dir in node stage/publish func other than csi attacher/mounter.
There are many reasons why we should skip the dir check in csi attacher/mounter:
Which issue(s) this PR fixes:
Fixes #86784
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/assign @jsafrane @msau42 @davidz627 @gnufied
cc @seh @kubernetes/sig-storage-pr-reviews
/priority important-soon
/sig storage