-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error message in Kubelet CPU assignment logic #121059
Improve error message in Kubelet CPU assignment logic #121059
Conversation
Hi @matte21. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's (still) unclear if we consider error messages part of the API contract. They probably are not, but the observability in this area is historically poor, so it's likely there's code/tooling (including tests) actually depending on the messages. Appending to the existing text is probably the best we can do.
There's an effort aiming to improve this aspect, stemming from the effort of GA'ing memory manager but pertaining to all resource managers (obv. incl. CPU manager). Let's see how this PR can fit in this picture.
@@ -453,7 +453,7 @@ func takeByTopologyNUMAPacked(topo *topology.CPUTopology, availableCPUs cpuset.C | |||
return acc.result, nil | |||
} | |||
if acc.isFailed() { | |||
return cpuset.New(), fmt.Errorf("not enough cpus available to satisfy request") | |||
return cpuset.New(), fmt.Errorf("not enough cpus available to satisfy request: %d requested, only %d available", numCPUs, availableCPUs.Size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go for a terser form: "requested %d, available %d" or even "requested=%d, available=%d". The current wording yields a nicer sentence, but IMO here practicality beats nice. Applies to everything below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d4a5a08
/cc |
Include number of requested and available CPUs in the error message when the assignment of CPUs fails because there are less available CPUs than requested.
a56f8e5
to
d4a5a08
Compare
The force push to d4a5a08 rebases on main and addresses the review comments. |
/retest |
/lgtm I think it's a nice little improvement |
LGTM label has been added. Git tree hash: b9d5f398186393dccab532299af9f2ecc6d30015
|
/assign @derekwaynecarr |
it is fine to update error messages. /approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr, matte21, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Include number of requested and available CPUs in the error message when the assignment of CPUs fails because there are less available CPUs than requested.
What type of PR is this?
/kind documentation
What this PR does / why we need it:
Include number of requested and available CPUs in the error message when the assignment of CPUs fails because there are less available CPUs than requested.
The new error message should speed up troubleshooting.
Special notes for your reviewer:
This change might be considered breaking, depending on whether the wording of error messages is guaranteed to be stable. i.e. if some client code examines the error by inspecting its message (e.g. via regexp), that client might break. The old wording was locked by unit tests, so I had to change some expected unit tests results.
Does this PR introduce a user-facing change?