-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node authorizer: limit kubelet access to ResourceClaim objects #116254
Conversation
Skipping CI for Draft Pull Request. |
Note that there is also an issue open to investigate whether the deterministic naming of generated ResourceClaim objects, something that this PR depends upon, can be replaced. Suppose the name of the ResourceClaim for a pod was generated and "the right one" for a pod had a But then how would kubelet learn about the right ResourceClaim? Can it watch "all" ResourceClaims with an informer and will it then only receive those that it has access to, if node authorizer checks that? I suppose that must work, because kubelet also needs to be informed efficiently about pods that were scheduled onto its node. |
Pods are not narrowly authorized, actually... any node can list/watch all pods, they just happen to filter with a field selector to just the pods they care about. Field selectors are not in the set of information considered by authorization. Pods are also a special case for scaling watch... they have a bespoke watch indexer hard-coded into the apiserver to make filtering by node name performant. I would not assume thousands of nodes watching all resource claims filtering by a label would perform well generically. If the name can be deterministic, that would be much better both for the kubelet discovering and authorizing. |
Isn't that a pretty big gap in the security design? Limiting access to the other types is nice, but a compromised kubelet can still do a lot of damage when it can read/write any pod. But that's just an aside.
Suppose the node authorizer was modified to do authorization for ResourceClaim by label. Would a watch then a) send only authorized objects and b) would that be efficient? |
it can only write pods assigned to it, it can read any pod
Authorizers don't have access to label selectors on incoming requests. Xref discussion in #44703 (comment)
A watch could send objects that only matched a label selector, but authorization couldn't force a specific label selector
A cluster-wide watch that sets a label selector that matches a fraction of the existing entries is not efficient. Every node starting a watch like this on an API that is high volume and fast moving (potentially 1:1 with pods) sounds problematic |
This all sounds like convincing arguments to keep the current design with predictable names. It also means that this PR is ready for review and merging because we won't change the underlying mechanism. |
@@ -393,6 +396,16 @@ func (g *Graph) AddPod(pod *corev1.Pod) { | |||
g.addEdgeToDestinationIndex_locked(e) | |||
} | |||
} | |||
|
|||
for _, podResourceClaim := range pod.Spec.ResourceClaims { | |||
claimName := resourceclaim.Name(pod, &podResourceClaim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like resourceclaim.Name returns pod.Name + claim.Name
... since both of those can contain -
characters, the following would conflict, right?
- pod
foo
, claimbar-baz
- pod
foo-bar
, claimbaz
I also don't see validation that ensures any of the following:
- ResourceClaimName is valid (passes the same validation we apply in ValidateClaim)
- ResourceClaimTemplateName is valid (passes the same validation we apply in ValidateClaimTemplate)
- the combined
pod.Name + "-" + claim.Name
is valid (passes the same validation we apply in ValidateClaim, especially length requirements)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conflict between foo
+ bar-baz
and foo-bar
and baz
was mentioned before. As both pods come from the same namespace, the user would be responsible for ensuring that no such clash occurs. Would it a) be possible to tighten the validation and b) useful?
The other points were also brought up before. The problem with adding such validation was that core/validation
then shares code with or depends on resource/validation
. At one point I had ValidateResourceClaimName
or something like it in core/validation
and was asked to only have and use it in resource/validation
because it didn't belong into the core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conflict between foo + bar-baz and foo-bar and baz was mentioned before. As both pods come from the same namespace, the user would be responsible for ensuring that no such clash occurs. Would it a) be possible to tighten the validation and b) useful?
I don't see why we would expose the user to sharp edges like that if we can avoid it with requirements for templated / directly used claim object names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with adding such validation was that core/validation then shares code with or depends on resource/validation. At one point I had ValidateResourceClaimName or something like it in core/validation and was asked to only have and use it in resource/validation because it didn't belong into the core.
We should validate the fields we have. If we have a reference in core, we should validate it. I don't especially care if the name validation function lives in core and is referenced from resource/validation
, or if we duplicate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we would expose the user to sharp edges like that if we can avoid it with requirements for templated / directly used claim object names
So that's a "yes" for not allowing "-" in the pod.spec.resourceClaims[*].name
field, right? Is it okay to tighten that validation in 1.27? I think it's okay as we are making some other "action required" changes, I just wanted to be sure.
We should validate the fields we have. If we have a reference in core, we should validate it.
/cc @thockin
I think you were the one who argued that core shouldn't validate against requirements defined by the v1alpha1 resource API.
I can add it back if the consensus is that we should do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a kubelet on a specific node won't know which ResourceClaim it needs to read for a pod
OK, that's a fair point - ideally "The set of resources which need to be read in order to run a pod are directly linked by the pod". Good principle.
Storing the generated name in the status would work, but does anyone here really have the motivation for another core/v1 API change?
I do this all day. v1 changes aren't SO bad given that you already have a gate in place.
@liggitt what think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally "The set of resources which need to be read in order to run a pod are directly linked by the pod". Good principle.
Yeah, I like that. Either deterministic mappings or explicit references from pod spec/status work in my view.
Putting the name of the spawned claim(s) in pod status for kubelets to discover and for the node authorizer to authorize on could work. NodeRestriction
admission would need an update to disallow nodes adding names of claims in pod status, but that's not hard.
I don't have a good sense for the trust level we're placing in the controller(s) handling these claim objects and whether we'd be happy letting those controller(s) write to status of arbitrary pods. I haven't followed the design enough to know if we expect a single central controller running on the control plane (which we'd be happy to let write pod status), or if this is a distributed bring-your-own-controller model (where we would not want to hand out pod status update permissions like candy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bring-your-own-controller model. However I don't think they would would need write permission to the PodStatus (just read permission).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I meant the controller creating the resourceclaim object and updating pod status (so the node and node authorizer could allow / get the claim object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right -- that's launched by the kube-controller-manager
and implemented here:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourceclaim/controller.go
I'd like to propose that we not merge this PR for 1.27. There's clearly a need for further discussion. What may be useful now is adding the validation of the generated name and object references, without changing how things work. |
The generated ResourceClaim name and the names of the ResourceClaimTemplate and ResourceClaim referenced by a pod must be valid according to the resource API, otherwise the pod cannot start. Checking this was removed from the original implementation out of concerns about validating fields in core against limitations imposed by a separate, alpha API. But as this was pointed out again in kubernetes#116254 (comment) it gets added back. The same strings that worked before still work now. In particular, the constraints for a spec.resourceClaim.name are still the same (DNS label).
yeah, we don't want to block cleanup of removed pods on unavailability of related resources (those could fail for lots of reasons, being forbidden, being deleted, etc) |
The status determines which claims kubelet is allowed to access when claims get created from a template. Therefore kubelet must not be allowed to modify that part of the status, because otherwise it could add an entry and then gain access to a claim it should have access to.
/lgtm |
LGTM label has been added. Git tree hash: b83a9068c3be1cf8b969663c3fb7fea31d6ef996
|
@bart0sh: can you help getting this PR merged?
/hold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-kind-dra |
/unhold |
/triage accepted |
The generated ResourceClaim name and the names of the ResourceClaimTemplate and ResourceClaim referenced by a pod must be valid according to the resource API, otherwise the pod cannot start. Checking this was removed from the original implementation out of concerns about validating fields in core against limitations imposed by a separate, alpha API. But as this was pointed out again in kubernetes#116254 (comment) it gets added back. The same strings that worked before still work now. In particular, the constraints for a spec.resourceClaim.name are still the same (DNS label).
What type of PR is this?
/kind feature
What this PR does / why we need it:
As discussed during the KEP and code review of dynamic resource allocation for Kubernetes 1.26, to increase security kubelet should only get read access to those ResourceClaim objects that are needed by a pod on the node.
This can be enforced easily by the node authorizer:
Which issue(s) this PR fixes:
Fixes #113866
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: