Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVE-2021-25740: Endpoint & EndpointSlice permissions allow cross-Namespace forwarding #103675

Closed
cjcullen opened this issue Jul 14, 2021 · 15 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@cjcullen
Copy link
Member

cjcullen commented Jul 14, 2021

A security issue was discovered with Kubernetes that could enable users to send network traffic to locations they would otherwise not have access to via a confused deputy attack.

This issue has been rated Low severity (CVSS:3.1/AV:N/AC:H/PR:L/UI:N/S:U/C:L/I:N/A:N), and assigned CVE-2021-25740.

Am I vulnerable?

If a potential attacker can create or edit Endpoints or EndpointSlices in the Kubernetes API, they can potentially direct a LoadBalancer or Ingress implementation to expose backend IPs the attacker should not have access to.
Importantly, if the target’s NetworkPolicy already trusts the Load Balancer or Ingress implementation, NetworkPolicy can not be used to prevent exposure from other namespaces, potentially bypassing any security controls such as LoadBalancerSourceRanges.
This issue is a design flaw that cannot be fully mitigated without user-facing changes. With this public announcement, we can begin conversations about a long-term fix.

Affected Versions

All Kubernetes versions are affected.

How do I mitigate this vulnerability?

There is no patch for this issue, and it can currently only be mitigated by restricting access to the vulnerable features. To mitigate the exposure, we recommend restricting write access to Endpoints and EndpointSlices by updating the system:aggregate-to-edit role using the attached file. This will remove write access to Endpoints from the admin and edit roles:

# Allow kubectl auth reconcile to work
kubectl annotate --overwrite clusterrole/system:aggregate-to-edit rbac.authorization.kubernetes.io/autoupdate=true

# Test reconcile, then run for real if happy
kubectl auth reconcile --remove-extra-permissions -f aggregate_to_edit_no_endpoints.yaml.txt --dry-run
kubectl auth reconcile --remove-extra-permissions -f aggregate_to_edit_no_endpoints.yaml.txt

# Prevent autoreconciliation back to old state
kubectl annotate --overwrite clusterrole/system:aggregate-to-edit rbac.authorization.kubernetes.io/autoupdate=false

Note: This will prevent new versions of Kubernetes from reconciling new default permissions to this role. No new default permissions have been added to this role since v1.14.0, but we recommend you remove the autoupdate=false annotation as soon as a fix or other mitigation is possible.

For use-cases that need to edit these resources, we recommend creating a new purpose-built Role with the desired permissions, and using it only for those cases.

Detection

Services with an empty selector rely on custom endpoints and are vulnerable to the attack described above. We recommend manually auditing any such usage. The following kubectl command will list all Services in a cluster with their selector:

kubectl get svc --all-namespaces -o=custom-columns='NAME:metadata.name,NAMESPACE:metadata.namespace,SELECTOR:spec.selector'

Note: Some Services without selectors specified may have their Endpoints managed by other controllers or tools. For example, endpoints for the default/kubernetes Service are managed by the Kubernetes API Server.

If you find evidence that this vulnerability has been exploited, please contact [email protected]

Additional Advisory

A similar attack is possible using Ingress implementations that support forwarding to ExternalName Services. This can be used to forward to Services in other namespaces or, in some cases, sensitive endpoints within the Ingress implementation. If you are using the Ingress API, we recommend confirming that the implementation you’re using either does not support forwarding to ExternalName Services or supports disabling the functionality.

Additional Details

See the GitHub issue for more updates: #103675

Thank You,
Rob Scott on behalf of Kubernetes SIG Network and CJ Cullen on behalf of the Kubernetes Product Security Committee

@cjcullen cjcullen added the kind/bug Categorizes issue or PR as related to a bug. label Jul 14, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 14, 2021
@cjcullen cjcullen removed kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 14, 2021
@kubernetes kubernetes deleted a comment from k8s-ci-robot Jul 14, 2021
@kubernetes kubernetes deleted a comment from k8s-ci-robot Jul 14, 2021
@cjcullen cjcullen changed the title WIP CVE-2021-25740: Endpoint & EndpointSlice permissions allow cross-Namespace forwarding Jul 14, 2021
@robscott
Copy link
Member

robscott commented Jul 15, 2021

The aggregate_to_edit.yaml file referenced above was attached to the original disclosure email: https://groups.google.com/g/kubernetes-announce/c/aXolwNe_KT4/m/HKK3174yAQAJ. I'm not sure if there's a good way to attach it here.

oktalz added a commit to haproxytech/kubernetes-ingress that referenced this issue Jul 15, 2021
oktalz added a commit to haproxytech/kubernetes-ingress that referenced this issue Jul 15, 2021
@liggitt
Copy link
Member

liggitt commented Jul 15, 2021

appended .txt suffix and attached it to the description

szuecs added a commit to zalando-incubator/kubernetes-on-aws that referenced this issue Jul 28, 2021
@rtheis
Copy link

rtheis commented Nov 17, 2021

Hi folks, I followed the instructions in the security advisory after upgrading my cluster from Kubernetes v1.21 to v1.22 and I'm left with the following difference which I suspect is due to #102858. End result is that I think the aggregate_to_edit.yaml file attached to https://groups.google.com/g/kubernetes-security-announce/c/WYE9ptrhSLE/m/EODhNR9yAQAJ needs to be updated. I would appreciate it if someone could review these findings. Thanks.

vagrant@verify-cluster:~/armada-ansible$ diff  /tmp/cluster-role-cve-1-21-post-upgrade-fix.yaml  /tmp/cluster-role-1-22.yaml
5,6c5,6
<     rbac.authorization.kubernetes.io/autoupdate: "false"
<   creationTimestamp: "2021-11-17T17:27:18Z"
---
>     rbac.authorization.kubernetes.io/autoupdate: "true"
>   creationTimestamp: "2021-11-15T16:50:51Z"
11,12c11,12
<   resourceVersion: "7386"
<   uid: 583ecf3d-47e7-450e-a3be-d7d45732d81a
---
>   resourceVersion: "96"
>   uid: 1c3d0834-8abd-4b1e-8d74-57b0b8cb1caf
50a51
>   - events

@rtheis
Copy link

rtheis commented Dec 1, 2021

Is it possible to amend the security announcement with the changes noted in my previous comment?

@rtheis
Copy link

rtheis commented Feb 25, 2022

I'm sorry, I misunderstood that the system:aggregate-to-edit change can be undo before upgrade to 1.22 by running kubectl annotate --overwrite clusterrole/system:aggregate-to-edit rbac.authorization.kubernetes.io/autoupdate=true. This should give users the secure behavior and the 1.22 additions when the upgrade completes.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 26, 2022
@boindil
Copy link

boindil commented Jun 10, 2022

Hi,

could anyony explain whether or not this has been fixed (and in what version)? We are currently using GKE (v1.21.11 and v1.22.8)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 10, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cjcullen
Copy link
Member Author

cjcullen commented Aug 9, 2022

Getting closed out by the triage bot is a kinda unceremonious death, but this issue should have been closed a while back with the work done to remove Endpoint & EndpointSlice permissions from the default roles, and docs changes to highlight the risk.

@JohnJAS
Copy link

JohnJAS commented Oct 14, 2022

Although this issue was closed, please notice that the Endpoint & EndpointSlice permission won't be deleted automactically if upgrading K8s from an impacted version.

Here are some tips for this situation. Please correct me if anything wrong.

I suggest to remove the Endpoint & EndpointSlice permission only instead of directly using the yaml above because it will delete more permissions on the new K8s version. And it is better to set rbac.authorization.kubernetes.io/autoupdate to true becasue new K8s version won't add the endpoints&endpointslices back to the clusterrole.

Commands for reference:

# Set rbac.authorization.kubernetes.io/autoupdate to true and keep it
kubectl annotate --overwrite clusterrole/system:aggregate-to-edit rbac.authorization.kubernetes.io/autoupdate=true

# Remove Endpoint & EndpointSlice permission only
kubectl get clusterroles system:aggregate-to-edit -o yaml > /tmp/aggregate_to_edit_no_endpoints.yaml
sed -i '/endpoints/d' /tmp/aggregate_to_edit_no_endpoints.yaml
sed -i '/endpointslices/d' /tmp/aggregate_to_edit_no_endpoints.yaml
kubectl auth reconcile --remove-extra-permissions -f /tmp/aggregate_to_edit_no_endpoints.yaml

@RyanStan
Copy link

RyanStan commented Nov 8, 2022

Is this is the pr (103703) that fixed this? Looks like this went into 1.22.0

@robscott
Copy link
Member

robscott commented Nov 8, 2022

@RyanStan this is the most relevant PR (also merged to 1.22): #103704. The other PR reverts a change that was merged in the 1.22 release cycle but reverted. As the issue above states, that patch is only effective for new 1.22+ clusters, others will want to run the manual mitigation steps described above.

@antoinetran
Copy link

Hi, when using vcluster, this RBAC is needed (see loft-sh/vcluster#1465). We cannot deploy vcluster, which is a really useful tool to deploy components that needs cluster-wide privilege for CRDs, in an shared environment Kubernetes.

I am thinking of alternative on how to mitigate this CVE while find a way to deploy vcluster.Since Ingresses are a bit deprecated in favor of Gateway API (which is now GA and integrated in OpenShift). Does this CVE impacts Gateway API? Maybe if we disable Ingresses and only enable Gateway in our shared Kubernetes, if that CVE is not relevant in this configuration, then our admin could give Endpoints privilege again...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

10 participants