Skip to content

WIP: api: apply field validation to dropped fields #130467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Feb 27, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

When a field gets dropped because the corresponding feature gate is disabled, the client currently gets neither a warning when fieldValidation: Warn is set (also the default), nor does it get an error for fieldValidation: Strict.

This is inconsistent with submitting the same request to an older Kubernetes which doesn't have the feature at all yet: there the fields are unknown and trigger warnings resp. an error during decoding.

This PR addresses that by allowing the REST strategy implementation to return warnings when dropping fields. The create and update code then return those warnings or turn them into an error when field validation is strict.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

TBD

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 27, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 27, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/apiserver area/test kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 27, 2025
@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 27, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Feb 27, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Auth Feb 27, 2025
@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Feb 27, 2025
@k8s-ci-robot k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Feb 27, 2025
Copy link
Contributor Author

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR even though the "real" changes are in a limited set of files. Let me call out those files by asking questions about the current approach (still in flux).

/cc @liggitt @deads2k @jpbetz

Continuation of https://kubernetes.slack.com/archives/C0EG7JC6T/p1739953008081669

@@ -1088,6 +1088,9 @@ const (
// CauseTypeResourceVersionTooLarge is used to report that the requested resource version
// is newer than the data observed by the API server, so the request cannot be served.
CauseTypeResourceVersionTooLarge CauseType = "ResourceVersionTooLarge"
// CauseTypeUnsupported is used to report that support for a resource or
// field is disabled.
CauseTypeUnsupported CauseType = "Unsupported"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When warning about a dropped field, the StatusCause needs a type.

Not sure whether a new type is needed. The closest existing one is "FieldValueNotSupported", which kind of makes sense because the non-nil field in the parent struct is also a value...

// caller will return them to the client as if they had been generated
// in WarningsOnCreate. If it is Strict, the warnings are turned into
// an error and the operation fails immediately.
PrepareForCreate(ctx context.Context, obj runtime.Object, fieldValidation string) (warnings []string, err error)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still debating with myself what the best return values are here...

The error only gets added for the sake of completeness as I am touching this method and all implementations. It seems useful that an implementer should also be able to say "stop right now, something is completely wrong" here. But for warnings about dropped fields its not needed.

I'm currently using warnings []string for consistency with WarningsOnCreate. That's fine when just passing those warnings through and has the advantage that warnings are not limited to dropped fields.

But when converting the warnings to an error, the caller has to parse the warning and provide a common error text, so in the end those have to be warnings about dropped fields and they have to have the expected format.

I'm therefore leaning towards a less flexible return value. field.ErrorList is too flexible because Error has additional fields (Type, BadValue) that we don't need.

[]struct{Field string, Message string} ?

Alternatively, we can keep the prototype as it is and require that implementations themselves turn warnings into an error for fieldValidation == "Strict", then return that error instead of warnings. A helper function could be provided which does that.

Copy link
Member

@liggitt liggitt Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple thoughts:

  1. I'd need to think more about what it would mean to add a fatal error return to this code path
  2. If possible, I'd want to handle what we do with dropped fieldPaths centrally and minimize the surface area / reliance on dozens of individual strategy methods constructing consistent messages and distinguishing between warn and error correctly

I think the minimal surface area required here is:

  1. In parameters, tell PrepareForCreate / PrepareForUpdate if the caller needs to know about dropped field paths (since it adds some complexity / cost to accumulate those ... if fieldValidation is "Ignore", there's no point in doing that work)
  2. In the return value, let PrepareForCreate / PrepareForUpdate return any dropped field paths, possibly with an associated reason string for each about why they were dropped

If we're changing the strategy signature, I'd probably plan for the future and add an input type and a return type we can augment in the future if needed

possibly something like:

type PrepareForCreateOptions struct {
  ReturnDroppedFieldPaths bool
}
type PrepareForCreateResult struct {
  DroppedFieldPaths []string // or map[string]string to give associated reasons?
}
PrepareForCreate(ctx context.Context, obj runtime.Object, opts PrepareForCreateOptions) PrepareForCreateResult

type PrepareForUpdateOptions struct {
  ReturnDroppedFieldPaths bool
}
type PrepareForUpdateResult struct {
  DroppedFieldPaths []string // or map[string]string to give associated reasons?
}
PrepareForUpdate(ctx context.Context, obj, old runtime.Object, opts PrepareForUpdateOptions) PrepareForUpdateResult

Then, in the one place we call these methods, we decide if we care about tracking dropped field paths based on whether we need to warn or error, and we handle warning or erroring based on the result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I'd want to handle what we do with dropped fieldPaths centrally and minimize the surface area / reliance on dozens of individual strategy methods constructing consistent messages and distinguishing between warn and error correctly.

Fully agreed. Whatever we do, it would be a lot of manual work.

In addition, like validation itself it also is conceptually at the wrong place: the field paths are for the internal API version. If the external APIs do not all use exactly the same paths, then users get confusing errors or warnings for those API versions with different paths. DRA will have this situation because we are restructuring a bit in v1beta2.

This leads me to: can we do this as part of declarative validation?

We already have the +featureGate annotation. Generating code which does the usual "if any feature-gated field is set or feature gate enabled, return, otherwise drop feature-gated fields" logic should be doable.

/cc @alexzielenski @jpbetz @yongruilin

Copy link
Member

@liggitt liggitt Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads me to: can we do this as part of declarative validation?

Only if we can guarantee we never error / warn for this on an update of an existing object that already contains data in that field

Generating code which does the usual "if any feature-gated field is set or feature gate enabled, return, otherwise drop feature-gated fields" logic should be doable.

Are we able to perfectly automatically detect / correlate use of the gated field in the old object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if we can guarantee we never error / warn for this on an update of an existing object that already contains data in that field

That's what I meant with "if any feature-gated field is set" - just accidentally dropped the "in the existing object".

Are we able to perfectly automatically detect / correlate use of the gated field in the old object?

That's a question to the people implementing declarative validation. We would need access to the stored object in the same version that the incoming update has. Then whatever is set there should match the +featureGate annotations and thus the code generated for that version.

details.Causes = append(details.Causes, cause)
}
err.ErrStatus.Details = details
return err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion into an error needs to go into some helper function, either one that is shared by create and update or exported such that implementations can use it.

if warn {
warnings = append(warnings, fmt.Sprintf("spec.devices.requests[%d].adminAccess: %s disabled", i, features.DRAAdminAccess))
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a proof-of-concept I've converted this "drop disabled fields". There's a corresponding E2E test in test/e2e/dra.go.

If we go ahead with this, do I need to update all "drop disabled fields" implementations and write tests for that in this PR? It would be desirable for the sake of consistency, but also a lot of busy work and lead to a very large PR.

The alternative is to add the ability to do this in one PR, then convert individual API types one-by-one. That work can be spread across different contributors.

I'm leaning towards the latter approach. They key argument is that clients cannot rely on getting an error for a while because they probably need to support also older Kubernetes release where this new functionality wasn't available yet. For DRA, we need such a "plan B" for DRA drivers running on Kubernetes 1.32.

@pohly pohly moved this from 🆕 New to 🏗 In progress in Dynamic Resource Allocation Feb 27, 2025
@liggitt liggitt added the api-review Categorizes an issue or PR as actively needing an API review. label Feb 27, 2025
@liggitt liggitt moved this to Backlog in API Reviews Feb 27, 2025
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Archive-it in SIG Node CI/Test Board Mar 5, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sftim
Copy link
Contributor

sftim commented Mar 12, 2025

If this is relevant to #127701, I suggest we make this a candidate for backports.

It won't cover everything but the fewer times we have to triage a sidecar being treated as an init container (and people filing bugs against the Helm chart, Kubernetes, etc) the better.

@pohly
Copy link
Contributor Author

pohly commented Mar 13, 2025

If this is relevant to #127701, I suggest we make this a candidate for backports.

It is, but it doesn't meet our backport criteria: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-release/cherry-picks.md#what-kind-of-prs-are-good-for-cherry-picks

@sftim
Copy link
Contributor

sftim commented Mar 13, 2025

(quoting from the page about backports)

If upon reflection you wish to continue, bolster your case by supplementing your PR with e.g.,

  • A GitHub issue detailing the problem
  • Scope of the change
  • Risks of adding a change
  • Risks of associated regression
  • Testing performed, test cases added
  • Key stakeholder SIG reviewers/approvers attesting to their confidence in the change being a required backport

I'm willing to support this. From my perspective, I want to preempt the problem #127701 calls out (the ecosystem sees a stream of issues from users confused about init vs. sidecar containers). I'd rather have this ahead of GA (for sidecar containers) than feel obliged to jump in to comms after the problem proves to be serious.
However, if it's just me, I wouldn't have capacity to champion the backport. Would SIG Node have the inclination and capacity to promote why a backport is worthwhile? If folks aren't convinced, I'll go quiet on this backport idea.

@pohly
Copy link
Contributor Author

pohly commented Mar 13, 2025

I can't speak for SIG Node, but considering how large this PR might become (note that it's not complete yet!) I expect considerable and justified pushback against backporting this. If I were SIG Node, I wouldn't try. SIG Arch might be a better champion for it, but even then I find it unlikely.

What SIG Node can do is delay graduation until this PR is merged. So far, it is unclear when that will be. It has not received any comments on the implementation questions that I raised.

@liggitt
Copy link
Member

liggitt commented Mar 13, 2025

If this is relevant to #127701, I suggest we make this a candidate for backports.

I don't think there's any way we would backport this. This requires signature changes and touching many many implementation files.

@sftim
Copy link
Contributor

sftim commented Mar 13, 2025

Ok, thanks @liggitt.

@k8s-ci-robot k8s-ci-robot requested a review from yongruilin March 24, 2025 11:52
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2025
@bart0sh bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: Backlog
Status: 🏗 In progress
Status: Needs Triage
Status: Needs Triage
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants