Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential security flaw of 9pfs: SUID/SGID not drop on file write in 9pfs #29

Open
ericvh opened this issue Mar 7, 2023 · 3 comments
Open

Comments

@ericvh
Copy link

ericvh commented Mar 7, 2023

Impact for 9p is basically the same as CVE-2018-13405, and the core
misbehaviour described was [1], quote:

"The intended behavior is that the non-member user can trigger creation of a
directory with group execution and SGID permission bits set whose group
ownership is of that group, but not a plain file."

[1] https://bugzilla.redhat.com/show_bug.cgi/?id=1599161

The thing here is though, this behaviour seems to differ among OSes and even
individual file systems: some would inherit permission bits to both newly
created directories and files, and others only for new directories. XFS
appears to have a config option fs.xfs.irix_sgid_inherit for that.

There is neither a specific behaviour restriction defined for either POSIX or
Linux that would cover the described expected behaviour, no?

+CC Dominique, Eric (9pfs Linux client)

On Wednesday, March 1, 2023 4:26:49 PM CET Michael S. Tsirkin wrote:
+CC Greg and Christian who know more about 9pfs.
Guys could you chime in on the security impact of this? Thanks!

On Wed, Mar 01, 2023 at 05:14:29PM +0800, JIETAO XIAO wrote:
Hi,

Our team finds when a local user in the guest tries to write an executable file
with SUID or SGID, none of these privileged bits are correctly dropped. As a
result, in rare circumstances(exist an executable file owned by root, writable
by others, has SUID/SGID bits), this flaw could be used by malicious users in
the guest to elevate their privileges within the guest and help a host's local
user to elevate privileges on the host.

Environment:

• QEMU version: QEMU emulator version 6.2.0
• Host/Guest architecture: X86
• Host kernel version: 5.17.0-rc1
• Guest kernel version: 6.1.14

Configurations:
We have tested it under different 9pfs configurations. The Qemu startup file is
like this:

sudo qemu-system-x86_64
-kernel /home/shawotao/tmp/linux/arch/x86/boot/bzImage
-append 'root=/dev/sda rw console=ttyS0'
-drive file=rootfs.img,format=raw
-M pc -cpu host --enable-kvm -smp 2
-m 2G
-chardev stdio,mux=on,id=mon -mon chardev=mon,mode=readline
-device virtio-serial-pci -device virtconsole,chardev=mon -vga none
-display none
-virtfs local,path=/tmp/test,mount_tag=myfs,security_model=passthrough,id=
fsdev0,writeout=immediate

Effected Configuration: fsdriver = local, all security model is affected,
mount 9pfs with cache=mmap

Reproduce Steps:

  1. Use the above configuration to startup QEMU.
  2. Mount 9pfs in the guest: root@virt: ~# mount -t 9p -o trans=virtio myfs /
    mnt -oversion=9p2000.L,posixacl,msize=512000,cache=mmap
  3. Prepare an executable file owned by root, writable by others, and has SUID,
    SGID: root@virt:/mnt# ./prepare_suid_file.sh
  4. Change to a local user: root@virt:/mnt# su shawtao
  5. Write the SGID/SUID executable file: shawtao@virt: ~# ./write

Root cause && fix suggestions:

  1. Missing kill SUID/SGID check in v9fs_file_write_iter(), so considering call
    file_modified() in v9fs_file_write_iter() can solve the issue.
  2. Normally, the host filesystem(i.e. ext4) can also help to clear the SUID/
    SGID when using security_model=none/passthrough. However, Qemu needs to run as
    root with this setting, thus the CAP_FSETID of the Qemu process helps the write
    operation pass the host filesystem's check and keep the SUID/SGID. (So the 9p
    server in Qemu can drop CAP_FSETID on every write like what
    virtfs-proxy-helper does)

Whom to Acknowledge for Reporting This Issue:
Name: Jietao Xiao email: [email protected]
Name: Wenbo Shen email: [email protected]
Name: Jinku Li email: [email protected]
Name: Nanzi Yang email: [email protected]

Best regards
Jietao Xiao

#!/bin/bash

dd if=/dev/zero of=suid_file bs=1024 count=1

chmod u+s,g+s,o+w,ugo+x suid_file

@carnil
Copy link

carnil commented Jul 20, 2023

This appears to be CVE-2023-1386

@rfrohl
Copy link

rfrohl commented Aug 8, 2023

Is there any reasoning why this was closed as NOTABUG by RedHat that someone can share?

@ericvh
Copy link
Author

ericvh commented Aug 8, 2023 via email

ericvh pushed a commit that referenced this issue May 12, 2024
As previously explained, the rehash delayed work migrates filters from
one region to another. This is done by iterating over all chunks (all
the filters with the same priority) in the region and in each chunk
iterating over all the filters.

When the work runs out of credits it stores the current chunk and entry
as markers in the per-work context so that it would know where to resume
the migration from the next time the work is scheduled.

Upon error, the chunk marker is reset to NULL, but without resetting the
entry markers despite being relative to it. This can result in migration
being resumed from an entry that does not belong to the chunk being
migrated. In turn, this will eventually lead to a chunk being iterated
over as if it is an entry. Because of how the two structures happen to
be defined, this does not lead to KASAN splats, but to warnings such as
[1].

Fix by creating a helper that resets all the markers and call it from
all the places the currently only reset the chunk marker. For good
measures also call it when starting a completely new rehash. Add a
warning to avoid future cases.

[1]
WARNING: CPU: 7 PID: 1076 at drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:407 mlxsw_afk_encode+0x242/0x2f0
Modules linked in:
CPU: 7 PID: 1076 Comm: kworker/7:24 Tainted: G        W          6.9.0-rc3-custom-00880-g29e61d91b77b #29
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:mlxsw_afk_encode+0x242/0x2f0
[...]
Call Trace:
 <TASK>
 mlxsw_sp_acl_atcam_entry_add+0xd9/0x3c0
 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x109/0x290
 mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x470
 process_one_work+0x151/0x370
 worker_thread+0x2cb/0x3e0
 kthread+0xd0/0x100
 ret_from_fork+0x34/0x50
 </TASK>

Fixes: 6f9579d ("mlxsw: spectrum_acl: Remember where to continue rehash migration")
Signed-off-by: Ido Schimmel <[email protected]>
Tested-by: Alexander Zubkov <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/cc17eed86b41dd829d39b07906fec074a9ce580e.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>
ericvh pushed a commit that referenced this issue May 28, 2024
Reinitialize the whole EST structure would also reset the mutex
lock which is embedded in the EST structure, and then trigger
the following warning. To address this, move the lock to struct
stmmac_priv. We also need to reacquire the mutex lock when doing
this initialization.

DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 505 at kernel/locking/mutex.c:587 __mutex_lock+0xd84/0x1068
 Modules linked in:
 CPU: 3 PID: 505 Comm: tc Not tainted 6.9.0-rc6-00053-g0106679839f7-dirty #29
 Hardware name: NXP i.MX8MPlus EVK board (DT)
 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __mutex_lock+0xd84/0x1068
 lr : __mutex_lock+0xd84/0x1068
 sp : ffffffc0864e3570
 x29: ffffffc0864e3570 x28: ffffffc0817bdc78 x27: 0000000000000003
 x26: ffffff80c54f1808 x25: ffffff80c9164080 x24: ffffffc080d723ac
 x23: 0000000000000000 x22: 0000000000000002 x21: 0000000000000000
 x20: 0000000000000000 x19: ffffffc083bc3000 x18: ffffffffffffffff
 x17: ffffffc08117b080 x16: 0000000000000002 x15: ffffff80d2d40000
 x14: 00000000000002da x13: ffffff80d2d404b8 x12: ffffffc082b5a5c8
 x11: ffffffc082bca680 x10: ffffffc082bb2640 x9 : ffffffc082bb2698
 x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001
 x5 : ffffff8178fe0d48 x4 : 0000000000000000 x3 : 0000000000000027
 x2 : ffffff8178fe0d50 x1 : 0000000000000000 x0 : 0000000000000000
 Call trace:
  __mutex_lock+0xd84/0x1068
  mutex_lock_nested+0x28/0x34
  tc_setup_taprio+0x118/0x68c
  stmmac_setup_tc+0x50/0xf0
  taprio_change+0x868/0xc9c

Fixes: b2aae65 ("net: stmmac: add mutex lock to protect est parameters")
Signed-off-by: Xiaolei Wang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants