-
Notifications
You must be signed in to change notification settings - Fork 40.9k
DRA plugin: handle serving failures #132598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
DRA plugin: handle serving failures #132598
Conversation
Skipping CI for Draft Pull Request. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
a214e7d
to
ea7767d
Compare
/pull-kubernetes-verify |
/test pull-kubernetes-unit-windows-master |
ea7767d
to
80517a9
Compare
/test pull-kubernetes-unit-windows-master |
/test pull-kubernetes-node-e2e-crio-cgrpv2-dra |
d1a6655
to
07d0757
Compare
/test pull-kubernetes-linter-hints |
/cc @nojnhuh |
@@ -274,15 +274,15 @@ func PluginSocket(name string) Option { | |||
} | |||
} | |||
|
|||
// PluginListener configures how to create the registrar socket. | |||
// PluginListener configures how to create the socket listener. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The listener for which socket?
"plugin" and "endpoint" are both ambiguous. Let's be clear about which socket we are dealing with.
I also like "create socket" better than "create socket listener". The main purpose of the function is to create a socket; the returned net.Listener
is then just an implementation detail.
The Go documentation is similar: "net.Listen" is described as "announces on the local network address", not as "creates a listener".
So I would fix the copy-and-paste error with:
PluginListener configures how to create the DRA service socket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The listener for which socket?
For either plugin(service) or registrar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PluginListener configures how to create the DRA service socket
This could be understood as it's for creating plugin(service) socket only, which is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For either plugin(service) or registrar.
Really? There's a separate RegistrarListener
for the registrar socket.
This could be understood as it's for creating plugin(service) socket only, which is not correct.
It is correct. The parameter of PluginListener
is only used for one socket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing out! Fixed.
/test pull-kubernetes-node-e2e-crio-cgrpv2-dra |
07d0757
to
708afa1
Compare
@@ -430,6 +430,18 @@ func DRAService(enabled bool) Option { | |||
} | |||
} | |||
|
|||
// ErrorChannel allows the plugin to send errors to the caller. This is | |||
// useful to report errors that happen in goroutines. The caller is expected | |||
// to read from the channel and handle errors appropriately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add guidance for DRA driver authors for what "handle errors appropriately" means?
Log them and continue? Log and exit because all errors are fatal? Are there special errors that are merely informative (probably not, but worth clarifying)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only usage of this for now is handling grpc.Server.Serve errors. Below explanation suggests performing a graceful shutdown. In other cases (feature cases) it could be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It says "log this error and perform a graceful shutdown" (emphasis mine) - but it doesn't say how the caller can detect this error.
Either we need to document how it can detect that error, or recommend that it does that action for all errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the description, PTAL.
// to read from the channel and handle errors appropriately. | ||
// For example, this channel can be used to report errors returned by the | ||
// grpcserver.Serve. Plugins then can use this error to perform a graceful shutdown. | ||
func ErrorChannel(errorChannel chan error) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func ErrorChannel(errorChannel chan error) Option { | |
func ErrorChannel(errorChannel chan<- error) Option { |
The helper is only meant to write to this channel, not read from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks for pointing out!
@@ -78,7 +78,10 @@ func startGRPCServer(logger klog.Logger, grpcVerbosity int, unaryInterceptors [] | |||
defer s.wg.Done() | |||
err := s.server.Serve(listener) | |||
if err != nil { | |||
logger.Error(err, "GRPC server failed") | |||
logger.Error(err, "gRPC server failed to serve") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's log here only if we don't pass on the error. Otherwise it might get logged twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
708afa1
to
c361112
Compare
This change introduces an ErrorChannel option to the DRA kubelet plugin, allowing errors from goroutines (such as gRPC server failures) to be reported to the caller.
Refactor the DRA e2e_node test helpers and test cases to accept variadic kubeletplugin.Option arguments. This change improves test flexibility and maintainability, allowing new options to be passed in the future without requiring widespread code changes. There are no functional changes to test coverage or behavior.
Added an e2e_node test to verify that the DRA plugin and registration services report gRPC server serving error correctly. The closed listener is created intentionally to simulate a serving error, ensuring that grpc.Server.Serve fails as expected and the error is sent to the error channel.
c361112
to
5d8e353
Compare
What type of PR is this?
/kind bug
/kind cleanup
What this PR does / why we need it:
ErrorChannel
option to the DRA plugin to report unrecoverable errors via a channel.Special notes for your reviewer:
If this PR is accepted I'm going to modify example driver to fail on serving failures.
Does this PR introduce a user-facing change?