Skip to content

Nicer value rendering in API errors #132314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

thockin
Copy link
Member

@thockin thockin commented Jun 15, 2025

Today, if the value passed is a struct, map, or list, we get Go's native rendering which is clunky.

This uses JSON (could be kyaml when that is ready) instead.

I hear it already: "But JSON is slow!". I benchmarked it -- for a simple int or string field, JSON is only a little slower (~20%) than a type assertion, but it IS slower, so I left the type assertion in. Remember that this is only called when an API error has occurred.

The type assertions do not handle typedefs-to-{string, int64, etc} so those will fall back on JSON. Almost all of our errors go thru standard functions which demand string or int64 anyway, so mostly pointless.

I also benchmarked using reflect to check CanInt() and that is almost exactly as fast as type-switch but handles more cases, so we COULD switch to that instead, if we wanted. I thought it wasn't worth the complexity.

JSON is really there to handle composite types.

/kind bug
/kind cleanup

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 15, 2025
@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jun 15, 2025
@dims dims added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 15, 2025
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

1 similar comment
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

Copy link
Contributor

@yongruilin yongruilin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 17, 2025
@thockin thockin force-pushed the jp_nicer_api_errors branch from 96a6584 to 6fcb038 Compare June 18, 2025 02:19
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Jun 18, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Jun 18, 2025
@thockin
Copy link
Member Author

thockin commented Jun 18, 2025

Open questions:

  • For types that define String() - should we prefer that or JSON? I chose JSON, but it changes some results. If we choose String() then do we put explicit quotes or leave it "naked"? One place (logs) renders int-type values as hex because of the default %#v, but other places like time render things that are clearly meant to be a string. We would not want to add MarshalJSON() to logs, since hex is not JSON. We could add ANOTHER optional method like MarshalLog() or MarshalErrorValue ? @pohly

  • metav1.Time has a MarshalJSON() and inherits a String() (from embedded time.Time) and they are different - should we make them the same? @deads2k

  • Since validation runs on internal types, we still get some GoNames instead of goNames, but this was true before.

@thockin thockin force-pushed the jp_nicer_api_errors branch from 6fcb038 to 5ac9af4 Compare June 18, 2025 02:52
@enj enj moved this to Needs Triage in SIG Auth Jun 18, 2025
thockin added 3 commits June 19, 2025 10:11
Today, if the value passed is a struct, map, or list, we get Go's vative
rendering which is clunky.

This uses JSON (could be kyaml when that is ready) instead.

I hear it already: "But JSON is slow!".  I benchmarked it -- for an
simple int or string field, JSON is only a little slower (~20%) than a
type assertion, but it IS slower, so I left the type assertion in.
Remember that this is only called when an API error has occurred.

The type assertions do not handle typedefs-to{string, int64, etc} so
those will fall back on JSON.  Almost all of our errors go thru standard
functions which demand string or int64 anyway, so mostly pointless.

I also benchmarked using reflect to check `CanInt()` and that is almost
exactly as fast as type-switch but handles more cases, so we COULD
switch to that instead, if we wanted. I thought it wasn't worth the
complexity.

JSON is really there to handle composite types.
Notes:
* For types that define String() - should we prefer that or JSON?
* metav1.Time has a MarshalJSON() and inhereits a String() and they are
  different
* Since validation runs on internal types, we still get some GoNames
  instead of goNames.
@thockin thockin force-pushed the jp_nicer_api_errors branch from 5ac9af4 to e68d601 Compare June 19, 2025 01:12
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stlaz stlaz moved this from Needs Triage to In Review in SIG Auth Jun 23, 2025
@stlaz
Copy link
Member

stlaz commented Jun 23, 2025

(triage):
looks good from sig-auth side (credentialplugin changes)

@pohly
Copy link
Contributor

pohly commented Jun 25, 2025

For types that define String() - should we prefer that or JSON? I chose JSON, but it changes some results.

At first glance, this seems like a situation where a user-visible representation of a value is needed, which is what String is supposed to provide. Was JSON chosen because some of our values have String implementations which are fairly unreadable (the protobuf generated String implementations come to mind) or because we want a complete dump of the bad value (similar to how some test frameworks dump the entire error in addition to the "summary string" returned byError)?

If we choose String() then do we put explicit quotes or leave it "naked"?

As we use error-wrapping style (i.e. <prefix>: <details>) one can read from left to right and figure out what the value is without quoting it. I prefer leaving it "naked".

@pohly
Copy link
Contributor

pohly commented Jun 25, 2025

one can read from left to right and figure out what the value is without quoting it

Except that there is more text after the value:

path.to.field: Invalid value: "the value": the details

That invalidates my argument and quoting becomes necessary.

Or can we shuffle things around?

path.to.field: Invalid value, the details: the value

@pohly
Copy link
Contributor

pohly commented Jun 25, 2025

Apropos error rendering: should or shouldn't validation tests check for expected errors by comparing against full strings, check for more or less complete sub-strings, or against errors produced by calling the same error method as in the validation code?

For DRA, I chose the latter because I didn't want the test to depend on the implementation of those methods. As seen in this PR, some other validation tests use strings which then need to be updated when changing the implementation. Strings have the advantage that one can check the readability of the user-visible error message and more easily spot when the wrong method is used when the result makes no sense.

@thockin
Copy link
Member Author

thockin commented Jun 26, 2025

The Genesis of this PR (other than being a long-annoying thing) is declarative validation. We added support to auto-check listmap types for duplicates and throw errors. Testing those exposed the fact that we produce a "duplicate value" error, where the value is (for example) a whole container. It's clearly NOT a duplicate value, but the key is buried in there.

I thought "perhaps we can return a map[string]any with just the key field(s) set, but that still gets rendered with Go's "%#v", which is not super helpful. But if I run that through JSON (or KYAML) it is nicer.

As for readability, I don't think we intend the final error to be machine parseable or splittable, but I am eager to make it more useful to humans.

E.g. is something like "Invalid value ("the value"): the details" better? I think the quotes are useful to distinguish "true" from true (as in labels) or 4 from "4" (as in resources).

As for unit tests, I prefer they operate with the new Matcher logic, so they are less dependent on exact strings. For example, I want to make the error message better for dns-label, and it breaks hundreds of tests. This is why we are adding "origin".

@pohly
Copy link
Contributor

pohly commented Jun 27, 2025

where the value is (for example) a whole container

So that's exactly the case where ignoring the fmt.Stringer implementation in favor of some nicer rendering makes sense. I sometimes wish we wouldn't need those fmt.Stringer implementations (but protobuf needs them) or nicer output (let's replace with KYAM?!), but for now preferring JSON as proposed in this PR makes sense.

One can also argue that the API errors are meant to provide a data dump of the values, not just a user visible rendering, because one may have to inspect the entire value.

I don't think we intend the final error to be machine parseable or splittable

Agreed, that's why I thought it would be okay to not use quoting. That can make strings less readable and works fine as long as humans can "spot" where the value starts.

is something like "Invalid value ("the value"): the details" better

If we keep quoting the value, then path.to.field: Invalid value: "the value": the details is fine.

I prefer they operate with the new Matcher logic, so they are less dependent on exact strings

So produce expected errors and compare against the actual errors, which is what DRA does - except that it doesn't use the ErrorMatcher helper yet. Let me look into changing that now...

@thockin
Copy link
Member Author

thockin commented Jun 27, 2025

The advantage of matcher is that you can decide which criteria to match, often the field path + error type + detail substring is OK, but as we add more origin, the details string matters less and becomes the opposite of useful.

@pohly
Copy link
Contributor

pohly commented Jun 27, 2025

I want my error matching to be pretty complete, but the "origin instead of detail string" is nice. I converted pkg/apis/resource/validation, which included making some changes elsewhere - see #132577

@thockin
Copy link
Member Author

thockin commented Jun 30, 2025

AFAIK this is OK to review now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Needs Triage
Status: In Review
Development

Successfully merging this pull request may close these issues.

9 participants