Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node authorizer: limit kubelet access to ResourceClaim objects #116254

Merged
merged 3 commits into from
Jul 18, 2023

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Mar 3, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

As discussed during the KEP and code review of dynamic resource allocation for Kubernetes 1.26, to increase security kubelet should only get read access to those ResourceClaim objects that are needed by a pod on the node.

This can be enforced easily by the node authorizer:

  • the names of the objects are defined by the pod status
  • there is no indirection as with volumes (pod -> pvc -> pv)

Which issue(s) this PR fixes:

Fixes #113866

Does this PR introduce a user-facing change?

kubelet: security of dynamic resource allocation was enhanced by limiting node access to those objects that are needed on the node.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/3063

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 3, 2023
@pohly pohly marked this pull request as ready for review March 3, 2023 13:50
@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Mar 3, 2023
@k8s-ci-robot k8s-ci-robot added sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 3, 2023
@pohly
Copy link
Contributor Author

pohly commented Mar 8, 2023

Note that there is also an issue open to investigate whether the deterministic naming of generated ResourceClaim objects, something that this PR depends upon, can be replaced.

Suppose the name of the ResourceClaim for a pod was generated and "the right one" for a pod had a pod.k8s.io=<pod name> label (details TBD). Would the node authorizer still be able to to check access? Perhaps by adding graph edges when a ResourceClaim with such a label gets added?

But then how would kubelet learn about the right ResourceClaim? Can it watch "all" ResourceClaims with an informer and will it then only receive those that it has access to, if node authorizer checks that? I suppose that must work, because kubelet also needs to be informed efficiently about pods that were scheduled onto its node.

@liggitt
Copy link
Member

liggitt commented Mar 10, 2023

But then how would kubelet learn about the right ResourceClaim? Can it watch "all" ResourceClaims with an informer and will it then only receive those that it has access to, if node authorizer checks that? I suppose that must work, because kubelet also needs to be informed efficiently about pods that were scheduled onto its node.

Pods are not narrowly authorized, actually... any node can list/watch all pods, they just happen to filter with a field selector to just the pods they care about. Field selectors are not in the set of information considered by authorization.

Pods are also a special case for scaling watch... they have a bespoke watch indexer hard-coded into the apiserver to make filtering by node name performant. I would not assume thousands of nodes watching all resource claims filtering by a label would perform well generically.

If the name can be deterministic, that would be much better both for the kubelet discovering and authorizing.

@pohly
Copy link
Contributor Author

pohly commented Mar 10, 2023

Pods are not narrowly authorized, actually...

Isn't that a pretty big gap in the security design? Limiting access to the other types is nice, but a compromised kubelet can still do a lot of damage when it can read/write any pod. But that's just an aside.

I would not assume thousands of nodes watching all resource claims filtering by a label would perform well generically.

Suppose the node authorizer was modified to do authorization for ResourceClaim by label. Would a watch then a) send only authorized objects and b) would that be efficient?

@liggitt
Copy link
Member

liggitt commented Mar 10, 2023

when it can read/write any pod

it can only write pods assigned to it, it can read any pod

Suppose the node authorizer was modified to do authorization for ResourceClaim by label.

Authorizers don't have access to label selectors on incoming requests. Xref discussion in #44703 (comment)

Would a watch then a) send only authorized objects

A watch could send objects that only matched a label selector, but authorization couldn't force a specific label selector

and b) would that be efficient?

A cluster-wide watch that sets a label selector that matches a fraction of the existing entries is not efficient. Every node starting a watch like this on an API that is high volume and fast moving (potentially 1:1 with pods) sounds problematic

@pohly
Copy link
Contributor Author

pohly commented Mar 10, 2023

This all sounds like convincing arguments to keep the current design with predictable names.

It also means that this PR is ready for review and merging because we won't change the underlying mechanism.

@@ -393,6 +396,16 @@ func (g *Graph) AddPod(pod *corev1.Pod) {
g.addEdgeToDestinationIndex_locked(e)
}
}

for _, podResourceClaim := range pod.Spec.ResourceClaims {
claimName := resourceclaim.Name(pod, &podResourceClaim)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like resourceclaim.Name returns pod.Name + claim.Name ... since both of those can contain - characters, the following would conflict, right?

  • pod foo, claim bar-baz
  • pod foo-bar, claim baz

I also don't see validation that ensures any of the following:

  • ResourceClaimName is valid (passes the same validation we apply in ValidateClaim)
  • ResourceClaimTemplateName is valid (passes the same validation we apply in ValidateClaimTemplate)
  • the combined pod.Name + "-" + claim.Name is valid (passes the same validation we apply in ValidateClaim, especially length requirements)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conflict between foo + bar-baz and foo-bar and baz was mentioned before. As both pods come from the same namespace, the user would be responsible for ensuring that no such clash occurs. Would it a) be possible to tighten the validation and b) useful?

The other points were also brought up before. The problem with adding such validation was that core/validation then shares code with or depends on resource/validation. At one point I had ValidateResourceClaimName or something like it in core/validation and was asked to only have and use it in resource/validation because it didn't belong into the core.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conflict between foo + bar-baz and foo-bar and baz was mentioned before. As both pods come from the same namespace, the user would be responsible for ensuring that no such clash occurs. Would it a) be possible to tighten the validation and b) useful?

I don't see why we would expose the user to sharp edges like that if we can avoid it with requirements for templated / directly used claim object names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with adding such validation was that core/validation then shares code with or depends on resource/validation. At one point I had ValidateResourceClaimName or something like it in core/validation and was asked to only have and use it in resource/validation because it didn't belong into the core.

We should validate the fields we have. If we have a reference in core, we should validate it. I don't especially care if the name validation function lives in core and is referenced from resource/validation, or if we duplicate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we would expose the user to sharp edges like that if we can avoid it with requirements for templated / directly used claim object names

So that's a "yes" for not allowing "-" in the pod.spec.resourceClaims[*].name field, right? Is it okay to tighten that validation in 1.27? I think it's okay as we are making some other "action required" changes, I just wanted to be sure.

We should validate the fields we have. If we have a reference in core, we should validate it.

/cc @thockin

I think you were the one who argued that core shouldn't validate against requirements defined by the v1alpha1 resource API.

I can add it back if the consensus is that we should do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a kubelet on a specific node won't know which ResourceClaim it needs to read for a pod

OK, that's a fair point - ideally "The set of resources which need to be read in order to run a pod are directly linked by the pod". Good principle.

Storing the generated name in the status would work, but does anyone here really have the motivation for another core/v1 API change?

I do this all day. v1 changes aren't SO bad given that you already have a gate in place.

@liggitt what think?

Copy link
Member

@liggitt liggitt Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally "The set of resources which need to be read in order to run a pod are directly linked by the pod". Good principle.

Yeah, I like that. Either deterministic mappings or explicit references from pod spec/status work in my view.

Putting the name of the spawned claim(s) in pod status for kubelets to discover and for the node authorizer to authorize on could work. NodeRestriction admission would need an update to disallow nodes adding names of claims in pod status, but that's not hard.

I don't have a good sense for the trust level we're placing in the controller(s) handling these claim objects and whether we'd be happy letting those controller(s) write to status of arbitrary pods. I haven't followed the design enough to know if we expect a single central controller running on the control plane (which we'd be happy to let write pod status), or if this is a distributed bring-your-own-controller model (where we would not want to hand out pod status update permissions like candy).

Copy link
Contributor

@klueska klueska Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bring-your-own-controller model. However I don't think they would would need write permission to the PodStatus (just read permission).

Copy link
Member

@liggitt liggitt Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I meant the controller creating the resourceclaim object and updating pod status (so the node and node authorizer could allow / get the claim object)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right -- that's launched by the kube-controller-manager and implemented here:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourceclaim/controller.go

@pohly
Copy link
Contributor Author

pohly commented Mar 13, 2023

I'd like to propose that we not merge this PR for 1.27. There's clearly a need for further discussion.

What may be useful now is adding the validation of the generated name and object references, without changing how things work.

pohly added a commit to pohly/kubernetes that referenced this pull request Mar 14, 2023
The generated ResourceClaim name and the names of the ResourceClaimTemplate and
ResourceClaim referenced by a pod must be valid according to the resource API,
otherwise the pod cannot start.

Checking this was removed from the original implementation out of concerns
about validating fields in core against limitations imposed by a separate,
alpha API.  But as this was pointed out again in
kubernetes#116254 (comment)
it gets added back.

The same strings that worked before still work now. In particular, the
constraints for a spec.resourceClaim.name are still the same (DNS label).
@pohly pohly changed the title node authorizer: limit kubelet access to ResourceClaim objects WIP: node authorizer: limit kubelet access to ResourceClaim objects Mar 14, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 14, 2023
@liggitt liggitt added this to the v1.28 milestone Mar 30, 2023
@pohly pohly changed the title node authorizer: limit kubelet access to ResourceClaim objects WIP: node authorizer: limit kubelet access to ResourceClaim objects Jul 13, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2023
@liggitt
Copy link
Member

liggitt commented Jul 13, 2023

kubelet no longer has read-access to the resource claim of that pod, but tries to get it anyway and fails

yeah, we don't want to block cleanup of removed pods on unavailability of related resources (those could fail for lots of reasons, being forbidden, being deleted, etc)

pohly and others added 2 commits July 13, 2023 20:42
The status determines which claims kubelet is allowed to access when claims get
created from a template. Therefore kubelet must not be allowed to modify that
part of the status, because otherwise it could add an entry and then gain
access to a claim it should have access to.
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 13, 2023
@liggitt
Copy link
Member

liggitt commented Jul 13, 2023

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 13, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b83a9068c3be1cf8b969663c3fb7fea31d6ef996

@pohly
Copy link
Contributor Author

pohly commented Jul 13, 2023

@bart0sh: can you help getting this PR merged?

  • get your kubelet checkpointing changes merged to get rid of the forbidden GET
  • /test pull-kubernetes-kind-dra
  • lift the hold when that is passing

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 13, 2023
@pohly pohly changed the title WIP: node authorizer: limit kubelet access to ResourceClaim objects node authorizer: limit kubelet access to ResourceClaim objects Jul 13, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 13, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Jul 13, 2023

@pohly sure, here is my PR: #119307
I'll unhold this one when mine is merged.
Should it be rebased after mine is merged?

@bart0sh
Copy link
Contributor

bart0sh commented Jul 18, 2023

/test pull-kubernetes-kind-dra

@bart0sh
Copy link
Contributor

bart0sh commented Jul 18, 2023

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 18, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Jul 18, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 18, 2023
@k8s-ci-robot k8s-ci-robot merged commit f55f278 into kubernetes:master Jul 18, 2023
13 of 14 checks passed
rayowang pushed a commit to rayowang/kubernetes that referenced this pull request Feb 9, 2024
The generated ResourceClaim name and the names of the ResourceClaimTemplate and
ResourceClaim referenced by a pod must be valid according to the resource API,
otherwise the pod cannot start.

Checking this was removed from the original implementation out of concerns
about validating fields in core against limitations imposed by a separate,
alpha API.  But as this was pointed out again in
kubernetes#116254 (comment)
it gets added back.

The same strings that worked before still work now. In particular, the
constraints for a spec.resourceClaim.name are still the same (DNS label).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

dynamic resource allocation: use node authorizer to lock down access by kubelet
7 participants