-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix device uncertain errors on reboot #122211
Fix device uncertain errors on reboot #122211
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Wed Dec 6 22:15:04 UTC 2023. |
d0eb7b0
to
ed0faca
Compare
/kind bug @gnufied please provide a release note and a test coverage, thanks |
got it, thank you for the details! The only thing is when driver is not available, if we can identify this error, we should avoid changing any state such as MarkDeviceAsUnmounted. But this is be out of scope this issue. |
yeah I agree. Lets file a follow up issue to clean things up. I want to keep things short in this PR, because we need to backport this to older versions. |
/lgtm |
LGTM label has been added. Git tree hash: 18bb9b2d4a484b6163937c66b7f5ea8b09732ae8
|
Hi @gnufied I see that we are backporting this PR to v1.29. Can we also backport it into the active branchs 1.26, 1.27, and 1.28? A lot of users hitting this issue are on older versions and not a lot of them can upgrade to v1.29 immediately. |
@PhanLe1010 yeah sure. |
/cherry-pick release-1.28 |
Thanks a lot @gnufied! We are very much appreciated your help |
…211-upstream-release-1.27 Automated cherry pick of #122211: Fix device uncertain errors on reboot
…211-upstream-release-1.26 Automated cherry pick of #122211: Fix device uncertain errors on reboot
…211-upstream-release-1.28 Automated cherry pick of #122211: Fix device uncertain errors on reboot
…211-upstream-release-1.29 Automated cherry pick of #122211: Fix device uncertain errors on reboot
Fixes #119608
The underlying reason of this issue is - when node that has pods with raw block volumes is shutdown and assuming node is shutdown for long enough time that - pods on the node get evicted (or deleted), then when the node comes back such pods will be usually stuck in terminating state.
The reason they are stuck in terminating state is because - although volume read from pod dir gets added to ASOW in
uncertain
state, a mount may still be performed for the volume and when that happens usually mount fails and that causes volume to be updated asunmounted
in ASOW. When this happens, it also removesdeviceMountPath
from the volume and hence subsequent attempt tounmap
the device fails becausedeviceMountPath
is empty.