[GH-ISSUE #4939] Deny CLONE_NEWUSER (restrict namespaces) #2832

Closed
opened 2026-05-05 09:29:09 -06:00 by gitea-mirror · 11 comments
Owner

Originally created by @rusty-snake on GitHub (Feb 13, 2022).
Original GitHub issue: https://github.com/netblue30/firejail/issues/4939

N/A

Describe the solution you'd like

An command (e.g. nonewuser) which blocks calls to clone (and others like unshare) if CLONE_NEWUSER is set.

Describe alternatives you've considered

N/A

Additional context

Flatpak does this for example.

Originally created by @rusty-snake on GitHub (Feb 13, 2022). Original GitHub issue: https://github.com/netblue30/firejail/issues/4939 ### Is your feature request related to a problem? Please describe. N/A ### Describe the solution you'd like An command (e.g. `nonewuser`) which blocks calls to `clone` (and others like `unshare`) if `CLONE_NEWUSER` is set. ### Describe alternatives you've considered N/A ### Additional context Flatpak does this for example.
gitea-mirror 2026-05-05 09:29:09 -06:00
Author
Owner

@rusty-snake commented on GitHub (Feb 13, 2022):

If anyone wants to play (on x86-64 systems!):

  • Download deny-clone-newuser.bpf.txt and remove the .txt that you need to add to files you upload to GH
  • Run bwrap --seccomp 4 --dev-bind / / /bin/bash 4<~/Downloads/deny-clone-newuser.bpf
  • and try to unshare --user
source

Cargo.toml:

[package]
name = "deny_clone_newuser_test"
version = "0.1.0"
edition = "2021"

[dependencies]
libc = "0.2"
libseccomp = "0.2.2"

src/main.rs:

use libseccomp::{get_syscall_from_name, scmp_cmp, ScmpAction, ScmpArgCompare, ScmpFilterContext};
use std::io;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    type SyscallBlocklist = &'static [(&'static str, i32, &'static [ScmpArgCompare])];

    const EPERM: i32 = libc::EPERM;
    const ENOSYS: i32 = libc::ENOSYS;
    const CLONE_NEWUSER: u64 = libc::CLONE_NEWUSER as u64;

    #[rustfmt::skip]
    const DENY_CLONE_NEWUSER: SyscallBlocklist = &[
        ("clone", EPERM, &[scmp_cmp!($arg0 & CLONE_NEWUSER == CLONE_NEWUSER)]),
        ("clone3", ENOSYS, &[]),
        ("unshare", EPERM, &[scmp_cmp!($arg0 & CLONE_NEWUSER == CLONE_NEWUSER)]),
    ];

    let mut ctx = ScmpFilterContext::new_filter(ScmpAction::Allow)?;

    for &(syscall, errno, comparators) in DENY_CLONE_NEWUSER {
        let syscall_nr = get_syscall_from_name(syscall, None)?;
        let action = ScmpAction::Errno(errno);

        ctx.add_rule_conditional(action, syscall_nr, comparators)?;
    }

    //ctx.export_pfc(&mut io::stdout())?;
    ctx.export_bpf(&mut io::stdout())?;

    Ok(())
}
<!-- gh-comment-id:1038381033 --> @rusty-snake commented on GitHub (Feb 13, 2022): If anyone wants to play (on x86-64 systems!): - Download [deny-clone-newuser.bpf.txt](https://github.com/netblue30/firejail/files/8056478/deny-clone-newuser.bpf.txt) <sub><sup>and remove the `.txt` that you need to add to files you upload to GH</sup></sub> - Run `bwrap --seccomp 4 --dev-bind / / /bin/bash 4<~/Downloads/deny-clone-newuser.bpf` - and try to `unshare --user` <details><summary>source</summary> Cargo.toml: ```toml [package] name = "deny_clone_newuser_test" version = "0.1.0" edition = "2021" [dependencies] libc = "0.2" libseccomp = "0.2.2" ``` src/main.rs: ```rust use libseccomp::{get_syscall_from_name, scmp_cmp, ScmpAction, ScmpArgCompare, ScmpFilterContext}; use std::io; fn main() -> Result<(), Box<dyn std::error::Error>> { type SyscallBlocklist = &'static [(&'static str, i32, &'static [ScmpArgCompare])]; const EPERM: i32 = libc::EPERM; const ENOSYS: i32 = libc::ENOSYS; const CLONE_NEWUSER: u64 = libc::CLONE_NEWUSER as u64; #[rustfmt::skip] const DENY_CLONE_NEWUSER: SyscallBlocklist = &[ ("clone", EPERM, &[scmp_cmp!($arg0 & CLONE_NEWUSER == CLONE_NEWUSER)]), ("clone3", ENOSYS, &[]), ("unshare", EPERM, &[scmp_cmp!($arg0 & CLONE_NEWUSER == CLONE_NEWUSER)]), ]; let mut ctx = ScmpFilterContext::new_filter(ScmpAction::Allow)?; for &(syscall, errno, comparators) in DENY_CLONE_NEWUSER { let syscall_nr = get_syscall_from_name(syscall, None)?; let action = ScmpAction::Errno(errno); ctx.add_rule_conditional(action, syscall_nr, comparators)?; } //ctx.export_pfc(&mut io::stdout())?; ctx.export_bpf(&mut io::stdout())?; Ok(()) } ``` </details>
Author
Owner

@topimiettinen commented on GitHub (Feb 28, 2022):

Good idea. I'd suggest a more generic command like systemd's RestrictNamespaces= directive, which can block multiple namespaces (cgroup, ipc, net, mnt, pid, user and uts).

<!-- gh-comment-id:1054380050 --> @topimiettinen commented on GitHub (Feb 28, 2022): Good idea. I'd suggest a more generic command like systemd's `RestrictNamespaces=` [directive](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RestrictNamespaces=), which can block multiple namespaces (cgroup, ipc, net, mnt, pid, user and uts).
Author
Owner

@rusty-snake commented on GitHub (Feb 28, 2022):

Looking at ee6fd6a509/src/shared/seccomp-util.c (L1206) this sums up to

if restrict_namespaces == ALL:
    # Block setns unconditionally because it is useless if all namespaces are disallowed.
    setns -> EPERM
else:
    # Otherwise block `arg1 == 0` which has the special meaning 'setns all namespaces'
    # allowing to bypass this restriction.
    setns(_, 0) -> EPERM

for restricted_namespace in restricted_namespaces:
    # Block unshare and setns calls which try to unshare/setns a restricted namespace.
    unshare(restricted_namespace) -> EPERM
    setns(_, restricted_namespace) -> EPERM
    # Block clone calls which try to unshare a restricted namespace.
    # NOTE: The interface of `clone` is different on different architectures.
    clone(restricted_namespace, ...) -> EPERM

# Not in systemds `seccomp_restrict_namespaces` but should be blocked to see
# https://github.com/flatpak/flatpak/security/advisories/GHSA-67h7-w3jq-vh4q
# CVE-2021-41133
# https://github.com/flatpak/flatpak/commit/a10f52a7565c549612c92b8e736a6698a53db330
clone3 -> ENOSYS
<!-- gh-comment-id:1054408588 --> @rusty-snake commented on GitHub (Feb 28, 2022): Looking at https://github.com/systemd/systemd/blob/ee6fd6a50922d2b27c97084e1c3f9872d495c273/src/shared/seccomp-util.c#L1206 this sums up to ```py3 if restrict_namespaces == ALL: # Block setns unconditionally because it is useless if all namespaces are disallowed. setns -> EPERM else: # Otherwise block `arg1 == 0` which has the special meaning 'setns all namespaces' # allowing to bypass this restriction. setns(_, 0) -> EPERM for restricted_namespace in restricted_namespaces: # Block unshare and setns calls which try to unshare/setns a restricted namespace. unshare(restricted_namespace) -> EPERM setns(_, restricted_namespace) -> EPERM # Block clone calls which try to unshare a restricted namespace. # NOTE: The interface of `clone` is different on different architectures. clone(restricted_namespace, ...) -> EPERM # Not in systemds `seccomp_restrict_namespaces` but should be blocked to see # https://github.com/flatpak/flatpak/security/advisories/GHSA-67h7-w3jq-vh4q # CVE-2021-41133 # https://github.com/flatpak/flatpak/commit/a10f52a7565c549612c92b8e736a6698a53db330 clone3 -> ENOSYS ```
Author
Owner

@rusty-snake commented on GitHub (Mar 18, 2022):

So after CVE-2022-0185 here's the next one CVE-2022-25636.

<!-- gh-comment-id:1072662932 --> @rusty-snake commented on GitHub (Mar 18, 2022): So after [CVE-2022-0185](https://nvd.nist.gov/vuln/detail/CVE-2022-0185) here's the next one [CVE-2022-25636](https://nvd.nist.gov/vuln/detail/CVE-2022-25636).
Author
Owner

@rusty-snake commented on GitHub (Apr 1, 2022):

An the list continues with CVE-2022-1015.

<!-- gh-comment-id:1086292445 --> @rusty-snake commented on GitHub (Apr 1, 2022): An the list continues with [CVE-2022-1015](https://access.redhat.com/security/cve/CVE-2022-1015).
Author
Owner

@rusty-snake commented on GitHub (Jun 2, 2022):

https://seclists.org/oss-sec/2022/q2/159

<!-- gh-comment-id:1144695461 --> @rusty-snake commented on GitHub (Jun 2, 2022): https://seclists.org/oss-sec/2022/q2/159
Author
Owner

@rusty-snake commented on GitHub (Jul 9, 2022):

CVE-2022-32250

Every month the same. And I don't even track all.

<!-- gh-comment-id:1179568728 --> @rusty-snake commented on GitHub (Jul 9, 2022): [CVE-2022-32250](https://nvd.nist.gov/vuln/detail/CVE-2022-32250) Every month the same. And I don't even track all.
Author
Owner

@ghost commented on GitHub (Jul 9, 2022):

Just posting this here because it might be of interest:
https://blog.cloudflare.com/live-patch-security-vulnerabilities-with-ebpf-lsm/

<!-- gh-comment-id:1179572404 --> @ghost commented on GitHub (Jul 9, 2022): Just posting this here because it might be of interest: https://blog.cloudflare.com/live-patch-security-vulnerabilities-with-ebpf-lsm/
Author
Owner

@smitsohu commented on GitHub (Jul 14, 2022):

Is someone working on this one or intends to do so?

If not I would be interested in taking it.

Maybe we can also set /proc/sys/user/max_{cgroup,ipc,mnt,net,pid,time,user,uts}_namespaces to zero if there is a noroot option...
These sysctls are namespaced and cannot be raised again inside the sandbox, because Firejail doesn't map root in the new user namespace, and also because /proc/sys is read-only. As checks happen in a different place in the kernel, I think it would increase the overall robustness.

<!-- gh-comment-id:1184827077 --> @smitsohu commented on GitHub (Jul 14, 2022): Is someone working on this one or intends to do so? If not I would be interested in taking it. Maybe we can also set `/proc/sys/user/max_{cgroup,ipc,mnt,net,pid,time,user,uts}_namespaces` to zero if there is a `noroot` option... These sysctls are namespaced and cannot be raised again inside the sandbox, because Firejail doesn't map root in the new user namespace, and also because `/proc/sys` is read-only. As checks happen in a different place in the kernel, I think it would increase the overall robustness.
Author
Owner

@smitsohu commented on GitHub (Jul 15, 2022):

Maybe we can also set /proc/sys/user/max_{cgroup,ipc,mnt,net,pid,time,user,uts}_namespaces to zero if there is a noroot option...
These sysctls are namespaced and cannot be raised again inside the sandbox, because Firejail doesn't map root in the new user namespace, and also because /proc/sys is read-only. As checks happen in a different place in the kernel, I think it would increase the overall robustness.

Or even better, unshare two user namespaces: The first user namespace only exists to impose limits on future namespace creation, by doing the equivalent of echo 1 > /proc/sys/user/max_user_namespaces. Then unshare a second time, and build the sandbox in that second user namespace.

This requires a non-privileged version of Firejail though, so we need the seccomp filter as well.

<!-- gh-comment-id:1185990694 --> @smitsohu commented on GitHub (Jul 15, 2022): > Maybe we can also set /proc/sys/user/max_{cgroup,ipc,mnt,net,pid,time,user,uts}_namespaces to zero if there is a noroot option... These sysctls are namespaced and cannot be raised again inside the sandbox, because Firejail doesn't map root in the new user namespace, and also because /proc/sys is read-only. As checks happen in a different place in the kernel, I think it would increase the overall robustness. Or even better, unshare two user namespaces: The first user namespace only exists to impose limits on future namespace creation, by doing the equivalent of `echo 1 > /proc/sys/user/max_user_namespaces`. Then unshare a second time, and build the sandbox in that second user namespace. This requires a non-privileged version of Firejail though, so we need the seccomp filter as well.
Author
Owner

@rusty-snake commented on GitHub (Apr 29, 2023):

And more CVEs mitigated by this feature: CVE-2023-1281, CVE-2023-1829

<!-- gh-comment-id:1528798756 --> @rusty-snake commented on GitHub (Apr 29, 2023): And more CVEs mitigated by this feature: CVE-2023-1281, CVE-2023-1829
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/firejail#2832
No description provided.