WIP: api: apply field validation to dropped fields #130467

pohly · 2025-02-27T07:38:33Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

When a field gets dropped because the corresponding feature gate is disabled, the client currently gets neither a warning when fieldValidation: Warn is set (also the default), nor does it get an error for fieldValidation: Strict.

This is inconsistent with submitting the same request to an older Kubernetes which doesn't have the feature at all yet: there the fields are unknown and trigger warnings resp. an error during decoding.

This PR addresses that by allowing the REST strategy implementation to return warnings when dropping fields. The create and update code then return those warnings or turn them into an error when field validation is strict.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

TBD

k8s-ci-robot · 2025-02-27T07:38:35Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

k8s-ci-robot · 2025-02-27T07:38:35Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-02-27T07:38:41Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-02-27T07:38:45Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/registry/OWNERS
staging/src/k8s.io/apiextensions-apiserver/OWNERS
staging/src/k8s.io/apimachinery/pkg/OWNERS
staging/src/k8s.io/apimachinery/pkg/apis/OWNERS
staging/src/k8s.io/apiserver/OWNERS
staging/src/k8s.io/kube-aggregator/OWNERS
~~test/integration/dra/OWNERS~~ [pohly]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly

Large PR even though the "real" changes are in a limited set of files. Let me call out those files by asking questions about the current approach (still in flux).

/cc @liggitt @deads2k @jpbetz

Continuation of https://kubernetes.slack.com/archives/C0EG7JC6T/p1739953008081669

pohly · 2025-02-27T07:43:10Z

staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go

@@ -1088,6 +1088,9 @@ const (
 	// CauseTypeResourceVersionTooLarge is used to report that the requested resource version
 	// is newer than the data observed by the API server, so the request cannot be served.
 	CauseTypeResourceVersionTooLarge CauseType = "ResourceVersionTooLarge"
+	// CauseTypeUnsupported is used to report that support for a resource or
+	// field is disabled.
+	CauseTypeUnsupported CauseType = "Unsupported"


When warning about a dropped field, the StatusCause needs a type.

Not sure whether a new type is needed. The closest existing one is "FieldValueNotSupported", which kind of makes sense because the non-nil field in the parent struct is also a value...

pohly · 2025-02-27T07:57:27Z

staging/src/k8s.io/apiserver/pkg/registry/rest/create.go

+	// caller will return them to the client as if they had been generated
+	// in WarningsOnCreate. If it is Strict, the warnings are turned into
+	// an error and the operation fails immediately.
+	PrepareForCreate(ctx context.Context, obj runtime.Object, fieldValidation string) (warnings []string, err error)


Still debating with myself what the best return values are here...

The error only gets added for the sake of completeness as I am touching this method and all implementations. It seems useful that an implementer should also be able to say "stop right now, something is completely wrong" here. But for warnings about dropped fields its not needed.

I'm currently using warnings []string for consistency with WarningsOnCreate. That's fine when just passing those warnings through and has the advantage that warnings are not limited to dropped fields.

But when converting the warnings to an error, the caller has to parse the warning and provide a common error text, so in the end those have to be warnings about dropped fields and they have to have the expected format.

I'm therefore leaning towards a less flexible return value. field.ErrorList is too flexible because Error has additional fields (Type, BadValue) that we don't need.

[]struct{Field string, Message string} ?

Alternatively, we can keep the prototype as it is and require that implementations themselves turn warnings into an error for fieldValidation == "Strict", then return that error instead of warnings. A helper function could be provided which does that.

A couple thoughts:

I'd need to think more about what it would mean to add a fatal error return to this code path

If possible, I'd want to handle what we do with dropped fieldPaths centrally and minimize the surface area / reliance on dozens of individual strategy methods constructing consistent messages and distinguishing between warn and error correctly

I think the minimal surface area required here is:

In parameters, tell PrepareForCreate / PrepareForUpdate if the caller needs to know about dropped field paths (since it adds some complexity / cost to accumulate those ... if fieldValidation is "Ignore", there's no point in doing that work)

In the return value, let PrepareForCreate / PrepareForUpdate return any dropped field paths, possibly with an associated reason string for each about why they were dropped

If we're changing the strategy signature, I'd probably plan for the future and add an input type and a return type we can augment in the future if needed

possibly something like:

type PrepareForCreateOptions struct { ReturnDroppedFieldPaths bool } type PrepareForCreateResult struct { DroppedFieldPaths []string // or map[string]string to give associated reasons? } PrepareForCreate(ctx context.Context, obj runtime.Object, opts PrepareForCreateOptions) PrepareForCreateResult type PrepareForUpdateOptions struct { ReturnDroppedFieldPaths bool } type PrepareForUpdateResult struct { DroppedFieldPaths []string // or map[string]string to give associated reasons? } PrepareForUpdate(ctx context.Context, obj, old runtime.Object, opts PrepareForUpdateOptions) PrepareForUpdateResult

Then, in the one place we call these methods, we decide if we care about tracking dropped field paths based on whether we need to warn or error, and we handle warning or erroring based on the result

If possible, I'd want to handle what we do with dropped fieldPaths centrally and minimize the surface area / reliance on dozens of individual strategy methods constructing consistent messages and distinguishing between warn and error correctly.

Fully agreed. Whatever we do, it would be a lot of manual work.

In addition, like validation itself it also is conceptually at the wrong place: the field paths are for the internal API version. If the external APIs do not all use exactly the same paths, then users get confusing errors or warnings for those API versions with different paths. DRA will have this situation because we are restructuring a bit in v1beta2.

This leads me to: can we do this as part of declarative validation?

We already have the +featureGate annotation. Generating code which does the usual "if any feature-gated field is set or feature gate enabled, return, otherwise drop feature-gated fields" logic should be doable.

/cc @alexzielenski @jpbetz @yongruilin

This leads me to: can we do this as part of declarative validation?

Only if we can guarantee we never error / warn for this on an update of an existing object that already contains data in that field

Generating code which does the usual "if any feature-gated field is set or feature gate enabled, return, otherwise drop feature-gated fields" logic should be doable.

Are we able to perfectly automatically detect / correlate use of the gated field in the old object?

Only if we can guarantee we never error / warn for this on an update of an existing object that already contains data in that field

That's what I meant with "if any feature-gated field is set" - just accidentally dropped the "in the existing object".

Are we able to perfectly automatically detect / correlate use of the gated field in the old object?

That's a question to the people implementing declarative validation. We would need access to the stored object in the same version that the incoming update has. Then whatever is set there should match the +featureGate annotations and thus the code generated for that version.

pohly · 2025-02-27T07:58:59Z

staging/src/k8s.io/apiserver/pkg/registry/rest/create.go

+			details.Causes = append(details.Causes, cause)
+		}
+		err.ErrStatus.Details = details
+		return err


This conversion into an error needs to go into some helper function, either one that is shared by create and update or exported such that implementations can use it.

pohly · 2025-02-27T08:09:54Z

pkg/registry/resource/resourceclaim/strategy.go

+			if warn {
+				warnings = append(warnings, fmt.Sprintf("spec.devices.requests[%d].adminAccess: %s disabled", i, features.DRAAdminAccess))
+			}
+		}


As a proof-of-concept I've converted this "drop disabled fields". There's a corresponding E2E test in test/e2e/dra.go.

If we go ahead with this, do I need to update all "drop disabled fields" implementations and write tests for that in this PR? It would be desirable for the sake of consistency, but also a lot of busy work and lead to a very large PR.

The alternative is to add the ability to do this in one PR, then convert individual API types one-by-one. That work can be spread across different contributors.

I'm leaning towards the latter approach. They key argument is that clients cannot rely on getting an error for a while because they probably need to support also older Kubernetes release where this new functionality wasn't available yet. For DRA, we need such a "plan B" for DRA drivers running on Kubernetes 1.32.

k8s-ci-robot · 2025-03-07T09:39:50Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sftim · 2025-03-12T17:55:53Z

If this is relevant to #127701, I suggest we make this a candidate for backports.

It won't cover everything but the fewer times we have to triage a sidecar being treated as an init container (and people filing bugs against the Helm chart, Kubernetes, etc) the better.

pohly · 2025-03-13T08:47:55Z

If this is relevant to #127701, I suggest we make this a candidate for backports.

It is, but it doesn't meet our backport criteria: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-release/cherry-picks.md#what-kind-of-prs-are-good-for-cherry-picks

sftim · 2025-03-13T10:07:38Z

(quoting from the page about backports)

If upon reflection you wish to continue, bolster your case by supplementing your PR with e.g.,

A GitHub issue detailing the problem

Scope of the change

Risks of adding a change

Risks of associated regression

Testing performed, test cases added

Key stakeholder SIG reviewers/approvers attesting to their confidence in the change being a required backport

I'm willing to support this. From my perspective, I want to preempt the problem #127701 calls out (the ecosystem sees a stream of issues from users confused about init vs. sidecar containers). I'd rather have this ahead of GA (for sidecar containers) than feel obliged to jump in to comms after the problem proves to be serious.
However, if it's just me, I wouldn't have capacity to champion the backport. Would SIG Node have the inclination and capacity to promote why a backport is worthwhile? If folks aren't convinced, I'll go quiet on this backport idea.

pohly · 2025-03-13T11:34:08Z

I can't speak for SIG Node, but considering how large this PR might become (note that it's not complete yet!) I expect considerable and justified pushback against backporting this. If I were SIG Node, I wouldn't try. SIG Arch might be a better champion for it, but even then I find it unlikely.

What SIG Node can do is delay graduation until this PR is merged. So far, it is unclear when that will be. It has not received any comments on the implementation questions that I raised.

liggitt · 2025-03-13T18:57:40Z

If this is relevant to #127701, I suggest we make this a candidate for backports.

I don't think there's any way we would backport this. This requires signature changes and touching many many implementation files.

sftim · 2025-03-13T18:59:11Z

Ok, thanks @liggitt.

k8s-triage-robot · 2025-06-24T23:07:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

WIP: api: apply field validation to dropped fields

c65d21e

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 27, 2025

k8s-ci-robot requested review from alexzielenski and aojea February 27, 2025 07:38

github-project-automation bot added this to SIG Auth and SIG Apps Feb 27, 2025

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 27, 2025

github-project-automation bot moved this to Needs Triage in SIG Apps Feb 27, 2025

github-project-automation bot moved this to Needs Triage in SIG Auth Feb 27, 2025

github-project-automation bot added this to SIG Node CI/Test Board Feb 27, 2025

k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Feb 27, 2025

github-project-automation bot moved this to Triage in SIG Node CI/Test Board Feb 27, 2025

k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Feb 27, 2025

github-project-automation bot added this to Dynamic Resource Allocation Feb 27, 2025

github-project-automation bot moved this to 🆕 New in Dynamic Resource Allocation Feb 27, 2025

pohly commented Feb 27, 2025

View reviewed changes

k8s-ci-robot requested review from deads2k, jpbetz and liggitt February 27, 2025 08:13

pohly moved this from 🆕 New to 🏗 In progress in Dynamic Resource Allocation Feb 27, 2025

liggitt added the api-review Categorizes an issue or PR as actively needing an API review. label Feb 27, 2025

github-project-automation bot added this to API Reviews Feb 27, 2025

liggitt moved this to Backlog in API Reviews Feb 27, 2025

SergeyKanzhelev moved this from Triage to Archive-it in SIG Node CI/Test Board Mar 5, 2025

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2025

pohly mentioned this pull request Mar 12, 2025

Sidecar containers can be interpreted as init containers #127701

Open

sftim mentioned this pull request Mar 13, 2025

Sidecar Containers kubernetes/enhancements#753

Closed

15 tasks

k8s-ci-robot requested a review from yongruilin March 24, 2025 11:52

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2025

github-project-automation bot added this to SIG Node: code and documentation PRs Jun 24, 2025

github-project-automation bot moved this to Triage in SIG Node: code and documentation PRs Jun 24, 2025

bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Jun 25, 2025

WIP: api: apply field validation to dropped fields #130467

Are you sure you want to change the base?

WIP: api: apply field validation to dropped fields #130467

Conversation

pohly commented Feb 27, 2025

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

k8s-ci-robot commented Feb 27, 2025

Uh oh!

k8s-ci-robot commented Feb 27, 2025

Uh oh!

k8s-ci-robot commented Feb 27, 2025

Uh oh!

k8s-ci-robot commented Feb 27, 2025

Uh oh!

pohly left a comment

Choose a reason for hiding this comment

Uh oh!

pohly Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

pohly Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

liggitt Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pohly Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

liggitt Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pohly Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

pohly Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

pohly Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Mar 7, 2025

Uh oh!

sftim commented Mar 12, 2025

Uh oh!

pohly commented Mar 13, 2025

Uh oh!

sftim commented Mar 13, 2025

Uh oh!

pohly commented Mar 13, 2025

Uh oh!

liggitt commented Mar 13, 2025

Uh oh!

sftim commented Mar 13, 2025

Uh oh!

k8s-triage-robot commented Jun 24, 2025

Uh oh!

Uh oh!

liggitt Mar 13, 2025 •

edited

Loading

liggitt Mar 25, 2025 •

edited

Loading