node authorizer: limit kubelet access to ResourceClaim objects #116254

pohly · 2023-03-03T13:50:33Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

As discussed during the KEP and code review of dynamic resource allocation for Kubernetes 1.26, to increase security kubelet should only get read access to those ResourceClaim objects that are needed by a pod on the node.

This can be enforced easily by the node authorizer:

the names of the objects are defined by the pod status
there is no indirection as with volumes (pod -> pvc -> pv)

Which issue(s) this PR fixes:

Fixes #113866

Does this PR introduce a user-facing change?

kubelet: security of dynamic resource allocation was enhanced by limiting node access to those objects that are needed on the node.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/3063

k8s-ci-robot · 2023-03-03T13:50:35Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

pohly · 2023-03-08T17:13:22Z

Note that there is also an issue open to investigate whether the deterministic naming of generated ResourceClaim objects, something that this PR depends upon, can be replaced.

Suppose the name of the ResourceClaim for a pod was generated and "the right one" for a pod had a pod.k8s.io=<pod name> label (details TBD). Would the node authorizer still be able to to check access? Perhaps by adding graph edges when a ResourceClaim with such a label gets added?

But then how would kubelet learn about the right ResourceClaim? Can it watch "all" ResourceClaims with an informer and will it then only receive those that it has access to, if node authorizer checks that? I suppose that must work, because kubelet also needs to be informed efficiently about pods that were scheduled onto its node.

liggitt · 2023-03-10T03:31:26Z

But then how would kubelet learn about the right ResourceClaim? Can it watch "all" ResourceClaims with an informer and will it then only receive those that it has access to, if node authorizer checks that? I suppose that must work, because kubelet also needs to be informed efficiently about pods that were scheduled onto its node.

Pods are not narrowly authorized, actually... any node can list/watch all pods, they just happen to filter with a field selector to just the pods they care about. Field selectors are not in the set of information considered by authorization.

Pods are also a special case for scaling watch... they have a bespoke watch indexer hard-coded into the apiserver to make filtering by node name performant. I would not assume thousands of nodes watching all resource claims filtering by a label would perform well generically.

If the name can be deterministic, that would be much better both for the kubelet discovering and authorizing.

pohly · 2023-03-10T07:15:47Z

Pods are not narrowly authorized, actually...

Isn't that a pretty big gap in the security design? Limiting access to the other types is nice, but a compromised kubelet can still do a lot of damage when it can read/write any pod. But that's just an aside.

I would not assume thousands of nodes watching all resource claims filtering by a label would perform well generically.

Suppose the node authorizer was modified to do authorization for ResourceClaim by label. Would a watch then a) send only authorized objects and b) would that be efficient?

liggitt · 2023-03-10T11:16:04Z

when it can read/write any pod

it can only write pods assigned to it, it can read any pod

Suppose the node authorizer was modified to do authorization for ResourceClaim by label.

Authorizers don't have access to label selectors on incoming requests. Xref discussion in #44703 (comment)

Would a watch then a) send only authorized objects

A watch could send objects that only matched a label selector, but authorization couldn't force a specific label selector

and b) would that be efficient?

A cluster-wide watch that sets a label selector that matches a fraction of the existing entries is not efficient. Every node starting a watch like this on an API that is high volume and fast moving (potentially 1:1 with pods) sounds problematic

pohly · 2023-03-10T13:27:14Z

This all sounds like convincing arguments to keep the current design with predictable names.

It also means that this PR is ready for review and merging because we won't change the underlying mechanism.

liggitt · 2023-03-13T14:09:36Z

plugin/pkg/auth/authorizer/node/graph.go

@@ -393,6 +396,16 @@ func (g *Graph) AddPod(pod *corev1.Pod) {
 g.addEdgeToDestinationIndex_locked(e)
 }
 }
+
+ for _, podResourceClaim := range pod.Spec.ResourceClaims {
+ claimName := resourceclaim.Name(pod, &podResourceClaim)


it looks like resourceclaim.Name returns pod.Name + claim.Name ... since both of those can contain - characters, the following would conflict, right?

pod foo, claim bar-baz

pod foo-bar, claim baz

I also don't see validation that ensures any of the following:

ResourceClaimName is valid (passes the same validation we apply in ValidateClaim)

ResourceClaimTemplateName is valid (passes the same validation we apply in ValidateClaimTemplate)

the combined pod.Name + "-" + claim.Name is valid (passes the same validation we apply in ValidateClaim, especially length requirements)

The conflict between foo + bar-baz and foo-bar and baz was mentioned before. As both pods come from the same namespace, the user would be responsible for ensuring that no such clash occurs. Would it a) be possible to tighten the validation and b) useful?

The other points were also brought up before. The problem with adding such validation was that core/validation then shares code with or depends on resource/validation. At one point I had ValidateResourceClaimName or something like it in core/validation and was asked to only have and use it in resource/validation because it didn't belong into the core.

The conflict between foo + bar-baz and foo-bar and baz was mentioned before. As both pods come from the same namespace, the user would be responsible for ensuring that no such clash occurs. Would it a) be possible to tighten the validation and b) useful?

I don't see why we would expose the user to sharp edges like that if we can avoid it with requirements for templated / directly used claim object names

The problem with adding such validation was that core/validation then shares code with or depends on resource/validation. At one point I had ValidateResourceClaimName or something like it in core/validation and was asked to only have and use it in resource/validation because it didn't belong into the core.

We should validate the fields we have. If we have a reference in core, we should validate it. I don't especially care if the name validation function lives in core and is referenced from resource/validation, or if we duplicate it.

I don't see why we would expose the user to sharp edges like that if we can avoid it with requirements for templated / directly used claim object names

So that's a "yes" for not allowing "-" in the pod.spec.resourceClaims[*].name field, right? Is it okay to tighten that validation in 1.27? I think it's okay as we are making some other "action required" changes, I just wanted to be sure.

We should validate the fields we have. If we have a reference in core, we should validate it.

/cc @thockin

I think you were the one who argued that core shouldn't validate against requirements defined by the v1alpha1 resource API.

I can add it back if the consensus is that we should do that.

a kubelet on a specific node won't know which ResourceClaim it needs to read for a pod

OK, that's a fair point - ideally "The set of resources which need to be read in order to run a pod are directly linked by the pod". Good principle.

Storing the generated name in the status would work, but does anyone here really have the motivation for another core/v1 API change?

I do this all day. v1 changes aren't SO bad given that you already have a gate in place.

@liggitt what think?

ideally "The set of resources which need to be read in order to run a pod are directly linked by the pod". Good principle.

Yeah, I like that. Either deterministic mappings or explicit references from pod spec/status work in my view.

Putting the name of the spawned claim(s) in pod status for kubelets to discover and for the node authorizer to authorize on could work. NodeRestriction admission would need an update to disallow nodes adding names of claims in pod status, but that's not hard.

I don't have a good sense for the trust level we're placing in the controller(s) handling these claim objects and whether we'd be happy letting those controller(s) write to status of arbitrary pods. I haven't followed the design enough to know if we expect a single central controller running on the control plane (which we'd be happy to let write pod status), or if this is a distributed bring-your-own-controller model (where we would not want to hand out pod status update permissions like candy).

It's a bring-your-own-controller model. However I don't think they would would need write permission to the PodStatus (just read permission).

sorry, I meant the controller creating the resourceclaim object and updating pod status (so the node and node authorizer could allow / get the claim object)

Right -- that's launched by the kube-controller-manager and implemented here:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourceclaim/controller.go

pohly · 2023-03-13T20:33:25Z

I'd like to propose that we not merge this PR for 1.27. There's clearly a need for further discussion.

What may be useful now is adding the validation of the generated name and object references, without changing how things work.

The generated ResourceClaim name and the names of the ResourceClaimTemplate and ResourceClaim referenced by a pod must be valid according to the resource API, otherwise the pod cannot start. Checking this was removed from the original implementation out of concerns about validating fields in core against limitations imposed by a separate, alpha API. But as this was pointed out again in kubernetes#116254 (comment) it gets added back. The same strings that worked before still work now. In particular, the constraints for a spec.resourceClaim.name are still the same (DNS label).

liggitt · 2023-07-13T16:06:13Z

kubelet no longer has read-access to the resource claim of that pod, but tries to get it anyway and fails

yeah, we don't want to block cleanup of removed pods on unavailability of related resources (those could fail for lots of reasons, being forbidden, being deleted, etc)

The status determines which claims kubelet is allowed to access when claims get created from a template. Therefore kubelet must not be allowed to modify that part of the status, because otherwise it could add an entry and then gain access to a claim it should have access to.

liggitt · 2023-07-13T18:51:03Z

/lgtm
/approve

k8s-ci-robot · 2023-07-13T18:51:11Z

LGTM label has been added.

Git tree hash: b83a9068c3be1cf8b969663c3fb7fea31d6ef996

pohly · 2023-07-13T18:52:22Z

@bart0sh: can you help getting this PR merged?

get your kubelet checkpointing changes merged to get rid of the forbidden GET
/test pull-kubernetes-kind-dra
lift the hold when that is passing

/hold

k8s-ci-robot · 2023-07-13T18:55:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~plugin/pkg/admission/noderestriction/OWNERS~~ [liggitt]
~~plugin/pkg/auth/authorizer/node/OWNERS~~ [liggitt]
~~test/OWNERS~~ [liggitt,pohly]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bart0sh · 2023-07-13T20:54:11Z

@pohly sure, here is my PR: #119307
I'll unhold this one when mine is merged.
Should it be rebased after mine is merged?

bart0sh · 2023-07-18T18:40:50Z

/test pull-kubernetes-kind-dra

bart0sh · 2023-07-18T19:08:20Z

/unhold

bart0sh · 2023-07-18T19:09:12Z

/triage accepted

The generated ResourceClaim name and the names of the ResourceClaimTemplate and ResourceClaim referenced by a pod must be valid according to the resource API, otherwise the pod cannot start. Checking this was removed from the original implementation out of concerns about validating fields in core against limitations imposed by a separate, alpha API. But as this was pointed out again in kubernetes#116254 (comment) it gets added back. The same strings that worked before still work now. In particular, the constraints for a spec.resourceClaim.name are still the same (DNS label).

pohly marked this pull request as ready for review March 3, 2023 13:50

k8s-ci-robot requested review from deads2k and mikedanese March 3, 2023 13:51

k8s-ci-robot added sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 3, 2023

k8s-ci-robot requested review from liggitt and tallclair March 3, 2023 13:52

liggitt reviewed Mar 13, 2023

View reviewed changes

k8s-ci-robot requested a review from thockin March 13, 2023 18:06

pohly mentioned this pull request Mar 14, 2023

api: extend validation of dynamic resource allocation fields in PodSpec #116576

Merged

pohly changed the title ~~node authorizer: limit kubelet access to ResourceClaim objects~~ WIP: node authorizer: limit kubelet access to ResourceClaim objects Mar 14, 2023

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 14, 2023

liggitt added this to the v1.28 milestone Mar 30, 2023

pohly changed the title ~~node authorizer: limit kubelet access to ResourceClaim objects~~ WIP: node authorizer: limit kubelet access to ResourceClaim objects Jul 13, 2023

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2023

pohly and others added 2 commits July 13, 2023 20:42

Add integration test for node authorizer claim references

39207da

k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 13, 2023

k8s-ci-robot assigned liggitt Jul 13, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 13, 2023

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 13, 2023

pohly changed the title ~~WIP: node authorizer: limit kubelet access to ResourceClaim objects~~ node authorizer: limit kubelet access to ResourceClaim objects Jul 13, 2023

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 13, 2023

bart0sh mentioned this pull request Jul 13, 2023

DRA: don't query claims from API server #119307

Merged

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 18, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 18, 2023

k8s-ci-robot merged commit f55f278 into kubernetes:master Jul 18, 2023
13 of 14 checks passed

klueska mentioned this pull request Jul 19, 2023

dynamic resource allocation kubernetes/enhancements#3063

Open

34 tasks

carlory mentioned this pull request Apr 23, 2024

remove resourceclaims resource from the system:node clusterrole #124472

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node authorizer: limit kubelet access to ResourceClaim objects #116254

node authorizer: limit kubelet access to ResourceClaim objects #116254

pohly commented Mar 3, 2023 •

edited

k8s-ci-robot commented Mar 3, 2023

pohly commented Mar 8, 2023

liggitt commented Mar 10, 2023

pohly commented Mar 10, 2023

liggitt commented Mar 10, 2023 •

edited

pohly commented Mar 10, 2023

liggitt Mar 13, 2023

pohly Mar 13, 2023

liggitt Mar 13, 2023

liggitt Mar 13, 2023

pohly Mar 13, 2023

thockin Mar 13, 2023

liggitt Mar 13, 2023 •

edited

klueska Mar 13, 2023 •

edited

liggitt Mar 13, 2023 •

edited

klueska Mar 13, 2023

pohly commented Mar 13, 2023

liggitt commented Jul 13, 2023

liggitt commented Jul 13, 2023

k8s-ci-robot commented Jul 13, 2023

pohly commented Jul 13, 2023

k8s-ci-robot commented Jul 13, 2023

bart0sh commented Jul 13, 2023

bart0sh commented Jul 18, 2023

bart0sh commented Jul 18, 2023

bart0sh commented Jul 18, 2023

node authorizer: limit kubelet access to ResourceClaim objects #116254

node authorizer: limit kubelet access to ResourceClaim objects #116254

Conversation

pohly commented Mar 3, 2023 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Mar 3, 2023

pohly commented Mar 8, 2023

liggitt commented Mar 10, 2023

pohly commented Mar 10, 2023

liggitt commented Mar 10, 2023 • edited

pohly commented Mar 10, 2023

liggitt Mar 13, 2023

Choose a reason for hiding this comment

pohly Mar 13, 2023

Choose a reason for hiding this comment

liggitt Mar 13, 2023

Choose a reason for hiding this comment

liggitt Mar 13, 2023

Choose a reason for hiding this comment

pohly Mar 13, 2023

Choose a reason for hiding this comment

thockin Mar 13, 2023

Choose a reason for hiding this comment

liggitt Mar 13, 2023 • edited

Choose a reason for hiding this comment

klueska Mar 13, 2023 • edited

Choose a reason for hiding this comment

liggitt Mar 13, 2023 • edited

Choose a reason for hiding this comment

klueska Mar 13, 2023

Choose a reason for hiding this comment

pohly commented Mar 13, 2023

liggitt commented Jul 13, 2023

liggitt commented Jul 13, 2023

k8s-ci-robot commented Jul 13, 2023

pohly commented Jul 13, 2023

k8s-ci-robot commented Jul 13, 2023

bart0sh commented Jul 13, 2023

bart0sh commented Jul 18, 2023

bart0sh commented Jul 18, 2023

bart0sh commented Jul 18, 2023

pohly commented Mar 3, 2023 •

edited

liggitt commented Mar 10, 2023 •

edited

liggitt Mar 13, 2023 •

edited

klueska Mar 13, 2023 •

edited

liggitt Mar 13, 2023 •

edited