-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
emptyDir with medium: Memory mounts a tmpfs volume without nosuid,nodev,noexec #48912
Comments
@nicorevin There are no sig labels on this issue. Please add a sig label by: |
@kubernetes/sig-storage-misc |
@nicorevin: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I don't follow why this is needed, emptyDir creates a new tmpfs "device" from nothing by doing "mount -o tmpfs tmpfs " and the only thing that then has access to that device is the pod. I think the advice about "noexec" etc. is about /tmp specifically not tmpfs in general |
@wongma7 then it is docker who is wrong. Try |
Good point, I dug up the issue that led to that moby/moby#12143 . I'm hardly qualified to talk about security issues so I'll defer to the reviewers :). The only thing I'll add for the reviewers to consider is that the docker options are easily overridable, whereas an emptyDir user will be stuck with noexec no matter what. |
This is what I was thinking about as well. Running a container with read-only root filesystem and noexec writable mounts is just one of the steps in eliminating possible attack vectors and reassuring that only the stuff pre-baked into the container will be executed. Here is a bunch of examples. |
Fine grained controls on emptydir aren't a bad idea, but for backwards
compatibility we'd need to preserve today's behavior AND have a PSP rule to
control them.
…On Sat, Jul 15, 2017 at 7:12 AM, Nick Revin ***@***.***> wrote:
the docker options are easily overridable, whereas an emptyDir user will
be stuck with noexec no matter what.
This is what I was thinking about as well.
On the one hand it would probably be good to *allow* users to mount tmpfs
with possibility to run executables.
On the other hand one container == one process. Kubernetes has elaborated
this principle by introducing pods as a matter for composite container
applications
<http://blog.kubernetes.io/2015/06/the-distributed-system-toolkit-patterns.html>
thus encouraging people to follow the principle of one proc per container
and addressing the need to run some helper tools in scenarios where this is
necessary.
Running a container with read-only root filesystem and noexec writable
mounts is just one of the steps in eliminating possible attack vectors and
reassuring that only the stuff pre-baked into the container will be
executed. Here is a bunch of examples
<https://www.slideshare.net/frohoff1/appseccali-2015-marshalling-pickles>.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#48912 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p4FHf03bjY3OEZ1APO8QVU_IKd_dks5sOJ6YgaJpZM4OXol->
.
|
Ok, to summarize:
|
Is there any action on this? We currently have this item in our risk register. While almost all of our pod filesystems are read-only, we are required to mount an emptyDir temporary filesystem at |
Can someone explain the attack vector that this is trying to prevent? |
It's part of a defense in depth strategy. An attacker being able to write executables to disk and executing them generally has a lower threshold than convincing an application to allocate a block of memory, mark it as executable and jump to it. |
So if I understand correctly, you want to prevent a scenario like:
So if kubelet mounted the emptydir as noexec, then it would avoid this scenario. But do you even need emptydir to do this? Couldn't you do this also with the container writable layer? |
We run all our pods with |
@msau42 |
@kubernetes/sig-auth-feature-requests imo this feature doesn't have to take the form of emptydir.mountoptions. It doesn't even need to be an emptydir field, it could be a pod.securitycontextt* field that says "all of this pod's emptydirs must be mounted noexec,nosuid,nodev", imo that would satisfy this use-case well enough without bringing in any of the complications of having psp validations parse all pods' volumes. e: didnt mean to recategorize as feature. 'feature'->solution to the bug and use-case->scenario |
Good point. Could you please point where ti start digging code for this?
|
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
/lifecycle frozen |
/remove-lifecycle stale |
I think I've got a change that might do what the OP was looking for, but haven't yet been able to do a proper e2e test. Having gone through this, though, I'm wondering if this is really the right solution, or if the right solution is more general and involves adding Wanting to have Briefly digging into that rabbit hole seems to lead quickly to a gap in the runtime API, where we can't pass arbitrary mount options to volumes, and even Not sure if this is worth pursuing further, especially given the limitation to the memory-based Thoughts? |
@glb but Docker does support this:
|
@adampl true, and in the branch I created to try this out, it seems pretty easy to extend It's even equivalently easy to do the same for However, for the default |
@glb Yes, I agree, this should be more generic. |
Because I couldn't leave it alone: today I learned that yes, you can bind-mount a directory to itself with options, so for example: # mkdir foo
# cat > foo/x.sh <<EOHD
#!/bin/sh
echo hi
EOHD
# chmod 755 foo/x.sh
# foo/x.sh
hi
# mount -o bind foo foo
# mount -o remount,bind,noexec foo foo
# foo/x.sh
bash: foo/x.sh: Permission denied
# umount foo
# foo/x.sh
hi so it would be possible to make I still don't think that's the right solution here... the right answer feels like having mount options for all |
mount options in volumeMounts was shot down for security reasons. For example, there is some option that can halt the node, |
@wongma7 yeah, that would be bad. maybe there needs to be some allow list, perhaps this is what was meant in earlier comments about |
Yep, I don't like the idea of ditching useful options for security reasons. The dangerous options should be disallowed by default, but possible to allow whichever options the cluster admin deems appropriate. |
There was significant discussion around this when introducing mount options in kubernetes/community#321 (comment) The resolution was that mount options would only be supported for PV/PVC volume sources, not inline pod volumes. |
@liggitt I may have skimmed too quickly, but the result of the conversation you linked seems to mean that a) "mount options" are really on the PV and would therefore apply equally to any container that mounts them (I can't think of a reasonable real-life counterexample); b) setting options for Any thoughts on what the right answer could be for I don't think it would be better to have Yet Another Boolean Option |
@liggitt First of all, I don't really understand why inline pod volumes are regarded as something separate from normal PVs. For example, why can't we have an The potential to crash a node can be a problem only for a subset of Kubernetes clusters working on shared (cloud) infrastructure. That for me is not a sufficient reason to block other use cases. |
/area security Without following all the context around PV/PVC mountOptions, I'm generally +1 on the idea of adding these (noexec,nosuid,nodev) as options on per-container volume mounts, or as options in the securityContext. I think defaulting those to |
Those options are probably useful (especially 'exec') but could it be possible to add a 'uid' volume mount parameter. For a better security, I don't want anything to run as root in my container and I also want my root FS to be readonly. I just need a small subdirectory where I dynamically generate a configuration file before launching the daemon. This small volume does not have to be persisted and emptyDir would fit perfectly if the tmpfs directory it provides could belong to a non-root user. Using bare docker, I can use the '--tmpfs /dir:rw,uid=$UID" option. In kubernetes, I have no way to set the 'uid=' option. So, I cannot combine using a read-only root FS and running as non-root. A more common case is the need for a writable /tmp. But enabling non-root users to write to /tmp requires a chmod which requires to be executed as root. Same case: Need to start as root and su/sudo to start daemon -> lower security. |
It sounds like a use case for the
Could you elaborate a little more on the last sentence, please? I don't see where you would need chmod: $ cat test-pod.yaml
apiVersion: v1
kind: Pod
metadata:
generateName: test-pod-
spec:
containers:
- image: ubuntu
name: test-container
command: [ "/bin/bash" ]
args: [ "-c", "id; touch /tmp/test-file; ls -l /tmp" ]
securityContext:
runAsUser: 1000
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp-volume
mountPath: /tmp
restartPolicy: Never
volumes:
- name: tmp-volume
emptyDir: {}
$ kubectl.exe create -f test-pod.yaml
pod/test-pod-77q5h created
$ kubectl.exe logs pod/test-pod-77q5h
uid=1000 gid=0(root) groups=0(root)
total 0
-rw-r--r-- 1 1000 root 0 Oct 7 11:57 test-file |
You're right. I thought emptyDir would set the mount point to 755 permissions, but it is 777. So, no chmod needed. |
Hi, I installed "Velero" and this one tries to exec something on a volume that was by default mounted from an I'm the only one having that, I just ran the commands listed in the how-to. Note that a year ago I got problems with Jenkins too, I had to mount its volumes on the host volume otherwise it was unable to run its custom Jenkins scripts (still partition with I'm with a Kubernetes cluster v1.15 managed by Kops on GCE... What could make that possible? That's so strange that your help is welcome :) :) |
On a kubeadm v1.18.15 cluster I created a pod with a readonly root filesystem by emptyDir /tmp. I was able to exec into the container, create a script in /tmp, and execute it. It would be great if we could specify mountOptions for emptyDir volumes. |
Hi Sorry, |
Just to come with a IMO valid usecase which would easily break with |
This is what I want to prevent in my clusters. We have some pods that run non-root with readonly rootfs, but which uses a writable /tmp from an The main concern is that being able to execute any file marked as executable, or worse, suid and executable, in a writable directory is a big problem that needs more attention, and it would be really appreciated if this was prioritized higher as it seems to be a security weakness in kubernetes to me. I'd also like to cast my vote that the most common use case desired is where the mount options are defined in the container spec, not the volume spec. I care less about the backing storage medium; what matters is that the container has the mount options set so that anything running in the container has no ability to execute from any of the writable volumes. |
Sweeping dust off this issue. MotivationCIS Benchmark for distribution-independent Linux has the following controls (and those controls propagate to other distribution-dependent benchmarks):
Although the aforementioned controls were created with a generic Linux OS in mind and and covers the In the sense that in the ideal scenario one would like to ensure only one executable is being executed inside the contanier and that executable comes from the image's
Possible solutionWe could reconsider the decision to disallow mountOptions for inline volumes and re-use the RationaleI have read through the kubernetes/community#321 and as far as I understand the decision to not allow users define mountOptions for inline volumes comes from the risk of node crash as suggested by @wongma7. Polluting Docker and Nerdctl, however, allow to pass mount options to tmpfs mounts (which can be considered as an equivalent to emptyDir in particular and inline volumes in general). Docker does it through maintaining a list of valid mount options and does not accept silly things like Example: ❯ docker run --rm --tmpfs /lol:errors=panic busybox
docker: Error response from daemon: Invalid tmpfs option ["errors" "panic"]. See and and nerdctl that re-uses the Apart from docker and nerdctl I have not performed prior art analysis for other container runtimes and clients but I am curious and will check how podman handles this. References |
I also desperately need being able to set mountOptions on emptyDir mounts. I'd need to add the |
Has there been any progress on this issue? |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
It is recommended to mount tmpfs with
nosuid,noexec,nodev
options.Environment:
kubectl version
): Client Version: v1.7.0 Server Version: v1.6.4uname -a
):The text was updated successfully, but these errors were encountered: