Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message in Kubelet CPU assignment logic #121059

Merged

Conversation

matte21
Copy link
Contributor

@matte21 matte21 commented Oct 8, 2023

Include number of requested and available CPUs in the error message when the assignment of CPUs fails because there are less available CPUs than requested.

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Include number of requested and available CPUs in the error message when the assignment of CPUs fails because there are less available CPUs than requested.

The new error message should speed up troubleshooting.

Special notes for your reviewer:

This change might be considered breaking, depending on whether the wording of error messages is guaranteed to be stable. i.e. if some client code examines the error by inspecting its message (e.g. via regexp), that client might break. The old wording was locked by unit tests, so I had to change some expected unit tests results.

Does this PR introduce a user-facing change?

When the Kubelet fails to assign CPUs to a Pod because there less available CPUs than the Pod requests, the error message changed from
"not enough cpus available to satisfy request" to "not enough cpus available to satisfy request: <num_requested> requested, only <num_available> available".

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/documentation Categorizes issue or PR as related to documentation. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 8, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @matte21. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 8, 2023
@ffromani
Copy link
Contributor

ffromani commented Oct 9, 2023

/ok-to-test
/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 9, 2023
Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's (still) unclear if we consider error messages part of the API contract. They probably are not, but the observability in this area is historically poor, so it's likely there's code/tooling (including tests) actually depending on the messages. Appending to the existing text is probably the best we can do.

There's an effort aiming to improve this aspect, stemming from the effort of GA'ing memory manager but pertaining to all resource managers (obv. incl. CPU manager). Let's see how this PR can fit in this picture.

@@ -453,7 +453,7 @@ func takeByTopologyNUMAPacked(topo *topology.CPUTopology, availableCPUs cpuset.C
return acc.result, nil
}
if acc.isFailed() {
return cpuset.New(), fmt.Errorf("not enough cpus available to satisfy request")
return cpuset.New(), fmt.Errorf("not enough cpus available to satisfy request: %d requested, only %d available", numCPUs, availableCPUs.Size())
Copy link
Contributor

@ffromani ffromani Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go for a terser form: "requested %d, available %d" or even "requested=%d, available=%d". The current wording yields a nicer sentence, but IMO here practicality beats nice. Applies to everything below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d4a5a08

@swatisehgal
Copy link
Contributor

/cc

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Oct 9, 2023
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Oct 9, 2023
Include number of requested and available CPUs in the error message
when the assignment of CPUs fails because there are less available
CPUs than requested.
@matte21 matte21 force-pushed the improve_err_message_in_cpu_assignments branch from a56f8e5 to d4a5a08 Compare October 9, 2023 17:32
@matte21
Copy link
Contributor Author

matte21 commented Oct 9, 2023

The force push to d4a5a08 rebases on main and addresses the review comments.

@matte21
Copy link
Contributor Author

matte21 commented Oct 9, 2023

/retest

@matte21
Copy link
Contributor Author

matte21 commented Oct 11, 2023

@ffromani

@ffromani
Copy link
Contributor

/lgtm

I think it's a nice little improvement

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 11, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b9d5f398186393dccab532299af9f2ecc6d30015

@ffromani ffromani moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Oct 11, 2023
@matte21
Copy link
Contributor Author

matte21 commented Oct 11, 2023

/assign @derekwaynecarr

@ffromani
Copy link
Contributor

/cc @klueska @mrunalp

@derekwaynecarr
Copy link
Member

it is fine to update error messages.

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2023
Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, matte21, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit c7d2703 into kubernetes:master Oct 16, 2023
13 of 14 checks passed
SIG Node PR Triage automation moved this from Needs Approver to Done Oct 16, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 16, 2023
@matte21 matte21 deleted the improve_err_message_in_cpu_assignments branch October 16, 2023 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

6 participants