Skip to content

Integrate Capabilities in Open-Lambda #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

m0mosenpai
Copy link

@m0mosenpai m0mosenpai commented May 15, 2025

Upon looking at the code (and the workshop slides), I found 3 possible places where we might drop privileges:

  1. Inside freshProc() [src/worker/sandbox/sock.go] right before exec-ing the server.py.
  2. Inside fork_server() [min-image/runtimes/server.py] right before the while loop that forks a new Zygote
  3. Inside start_container() [ min-image/runtimes/server.py] right before forking a Leaf to run the user code.

Some general questions I have:

  1. When and on what basis is either of web_server.py or fork_server.py invoked? (EDIT: I meant web_server() and fork_server() functions in server.py my bad)
  2. I also noticed that the base directory for OL gets initialized with 700 permissions and since we run everything as root, it makes it so that I have to su before I can navigate to the directory and create a lambda function with my code. Is it possible that none of this is run as root right from the beginning itself so this issue doesn't occur either? Or maybe the base directory can be initialized with more relaxed permissions?
  3. Which capabilities do we really need at each step? One plan for finding that out was that maybe we can drop all privileges at each step and strace the system calls (or if we know ones that are needed already) to see which capabilities they need and then iteratively add and test them to see if all things work as expected. Which also brings me to another question
  4. Is there a good way I can test all of my changes? Test cases or things I should look out for?

@@ -388,6 +454,8 @@ static PyMethodDef OlMethods[] = {
{"unshare", (PyCFunction)ol_unshare, METH_NOARGS, "unshare"},
{"fork", (PyCFunction)ol_fork, METH_NOARGS, "fork"},
{"enable_seccomp", (PyCFunction)ol_enable_seccomp, METH_NOARGS, "enable_seccomp"},
{"modify_cap", (PyCFunction)ol_modify_cap, METH_VARARGS, "modify_cap"},
Copy link
Author

@m0mosenpai m0mosenpai May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerharter This is still a work in progress. But the basic idea was that maybe we can expose a function here to modify/ drop capabilities.

@tylerharter
Copy link
Member

Great progress! Let's do one case at a time, each in different PRs:

How about this first? "right before forking a Leaf to run the user code". But is it before, or after forking?

For you four questions:

  1. I'm a little confused, I don't think we have files named web_server.py or fork_server.py
  2. OL needs root sometimes to manipulate cgroups and namespaces. Maybe there's something to improve here, but probably as a separate effort (after capability dropping)
  3. Unfortunately, creating namespaces and cgroups requires CAP_SYS_ADMIN (https://man.archlinux.org/man/clone3.2.en). CAP_SYS_ADMIN is very broad and is sometimes called the "new root", so we cannot precisely drop to the exact privileges we need. But something is better than nothing, and this is a great step if CAP_SYS_ADMIN gets broken into finer-grained permissions in the future.
  4. You can see some of the CI tests we run (https://github.com/open-lambda/open-lambda/blob/main/.github/workflows/ci.yml#L64) and do those locally. You should eventually add some tests that try to use capabilities that are not allowed (we should see the invocation fail).

Copy link
Member

@tylerharter tylerharter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start!


capList[0] = cap;
if (cap_set_flag(caps, CAP_EFFECTIVE, 1, capList, setting) == -1) {
cap_free(caps);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the nice things about a goto is it gives you a single place to clean things up, like this:

if (caps != NULL)
cap_free(caps);

@@ -8,6 +9,71 @@
#include <sys/types.h>
#include <sys/wait.h>
#include <seccomp.h>
#include <sys/capability.h>

static int modify_cap_impl(int cap, int setting) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what extent can we just create Python wrappers around the cap API (can_get_proc, cap_set_flag, etc), and then have the logic using it in Python code only?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerharter I couldn't actually find a native way to do this in Python (atleast at a granular level) and then came across the seccomp code in this file. It feels like we should be able to do a lot with the wrappers - like we have good amount of control over how we want to expose these functions to Python. Is there a concern around doing it in this way?

We may have to do it in a different way (or use an external module?) for dropping privileges in sock.go since these would only work in Python.

@@ -208,6 +208,7 @@ def main():
file.write(pid)
print(f'server.py: joined cgroup, close FD {fd_id}')

# drop privileges here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make capability dropping optional, via config, like it is for seccomp? https://github.com/open-lambda/open-lambda/blob/main/min-image/runtimes/python/server.py#L189

@m0mosenpai
Copy link
Author

m0mosenpai commented Jun 10, 2025

@tylerharter My bad, I meant to say web_server and fork_server functions in server.py. I edited the original comment. I couldn't understand which one was being invoked when.

Okay sure, that makes sense. I'll start with CAP_SYS_ADMIN then. Thanks for pointing me to the tests.

@m0mosenpai m0mosenpai closed this Jun 10, 2025
@m0mosenpai m0mosenpai reopened this Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants