[GH-ISSUE #3685] Warn on static binaries + seccomp #2321

Open
opened 2026-05-05 09:00:40 -06:00 by gitea-mirror · 3 comments
Owner

Originally created by @commial on GitHub (Oct 22, 2020).
Original GitHub issue: https://github.com/netblue30/firejail/issues/3685

Hi there,

From what I understand on how firejail is working:

  • (from documentation):

if the blocked system calls would also block Firejail from operating, they are handled by adding a
preloaded library which performs seccomp system calls later.

Long story short, to seccomp system call such as execve, firejail is injecting code through LD_PRELOAD mechanism.
As a result, for static binaries, this is ignored, and the resulting process will be able to execve.

It would be nice to have a warning that the seccomp will not be honored, or even an opt-in option to avoid these behavior (ie. exit instead of launch the binary), for use cases where Firejail is use to sandboxed untrusted binaries.
I don't know if another mechanism (like using ptrace) could be used to actually circumvent this behavior.

To reproduce:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
	char *newargv[] = { "/bin/ls", "/", NULL };
	char *newenviron[] = { NULL };

	execve("/bin/ls", newargv, newenviron);
	exit(EXIT_FAILURE);
}
# dynamic version
$ firejail --noprofile --shell=none --seccomp=execve ./exec
Parent pid 14822, child pid 14823
Post-exec seccomp protector enabled
[...]
(execve is prevented)

# static version
$ firejail --noprofile --shell=none --seccomp=execve ./exec-static 
Parent pid 15030, child pid 15031
Post-exec seccomp protector enabled
Seccomp list in: execve, check list: @default-keep, postlist: execve
Child process initialized in 33.37 ms
bin    core  home	     lib    libx32	mnt   root  snap  tmp  vmlinuz
[...]

Parent is shutting down, bye...
Originally created by @commial on GitHub (Oct 22, 2020). Original GitHub issue: https://github.com/netblue30/firejail/issues/3685 Hi there, From what I understand on how firejail is working: * (from documentation): > if the blocked system calls would also block Firejail from operating, they are handled by adding a > preloaded library which performs seccomp system calls later. * This preloaded library is added in https://github.com/netblue30/firejail/blob/master/src/firejail/fs_trace.c#L105 * Its source code is https://github.com/netblue30/firejail/blob/master/src/libpostexecseccomp/libpostexecseccomp.c Long story short, to `seccomp` system call such as `execve`, firejail is injecting code through `LD_PRELOAD` mechanism. As a result, for static binaries, this is ignored, and the resulting process will be able to `execve`. It would be nice to have a warning that the `seccomp` will not be honored, or even an opt-in option to avoid these behavior (ie. exit instead of launch the binary), for use cases where Firejail is use to sandboxed untrusted binaries. I don't know if another mechanism (like using `ptrace`) could be used to actually circumvent this behavior. To reproduce: ```C #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main() { char *newargv[] = { "/bin/ls", "/", NULL }; char *newenviron[] = { NULL }; execve("/bin/ls", newargv, newenviron); exit(EXIT_FAILURE); } ``` ```sh # dynamic version $ firejail --noprofile --shell=none --seccomp=execve ./exec Parent pid 14822, child pid 14823 Post-exec seccomp protector enabled [...] (execve is prevented) # static version $ firejail --noprofile --shell=none --seccomp=execve ./exec-static Parent pid 15030, child pid 15031 Post-exec seccomp protector enabled Seccomp list in: execve, check list: @default-keep, postlist: execve Child process initialized in 33.37 ms bin core home lib libx32 mnt root snap tmp vmlinuz [...] Parent is shutting down, bye... ```
gitea-mirror added the
bug
label 2026-05-05 09:00:41 -06:00
Author
Owner

@topimiettinen commented on GitHub (Oct 22, 2020):

Right. I think warning in documentation and at runtime would be appropriate.

Perhaps the problem could be avoided if Firejail executed in these cases a custom loader, which would set up the seccomp filter and then loaded the actual binary like kernel would do for static executables. That couldn't be bypassed.

<!-- gh-comment-id:714511612 --> @topimiettinen commented on GitHub (Oct 22, 2020): Right. I think warning in documentation and at runtime would be appropriate. Perhaps the problem could be avoided if Firejail executed in these cases a custom loader, which would set up the seccomp filter and then loaded the actual binary like kernel would do for static executables. That couldn't be bypassed.
Author
Owner

@commial commented on GitHub (Oct 23, 2020):

Thanks for your quick answer.

From what I understand, the only seccomp "post-exec" syscalls are the one from "@default-keep":

syscalls_in_list(list, "@default-keep", fd, &prelist, &postlist, native);

Which resolves to:

# etc/templates/syscalls.txt
@default-keep=execve,prctl

Another way to convince ourselves is to ask firejail directly:

$ firejail --noprofile --shell=none --seccomp=$(firejail --debug-syscalls | awk '{print $3;}' | tr '\n' ',') ls
Parent pid 9768, child pid 9769
Post-exec seccomp protector enabled
Seccomp list in: ...
, postlist: execve,prctl

We can also write a tiny program to check for prctl:

printf("secbits = 0x%x => ", prctl(PR_GET_SECUREBITS, 0, 0, 0, 0));

And again:

$ firejail --noprofile --shell=none --seccomp=prctl ./prctl_example
Parent pid 10123, child pid 10124
Post-exec seccomp protector enabled
Seccomp list in: prctl, check list: @default-keep, postlist: prctl
(get killed)

$ firejail --noprofile --shell=none --seccomp=prctl ./prctl_example-static
Parent pid 10135, child pid 10136
Post-exec seccomp protector enabled
Seccomp list in: prctl, check list: @default-keep, postlist: prctl
Child process initialized in 21.85 ms
secbits = 0x0 => []

Parent is shutting down, bye...

So, the problem is for, and only for, prctl and execve.

The custom-loader solution seems a bit overkill to me, and could actually lead to more problems (ELF parsing, etc.).
I've tried to look how others solutions circumvent this problem. From what I understand, systemd actually disallow (in the sense: "will always fail") seccomp-ing execve: (from the man page)

Note that strict system call filters may impact execution and error handling code paths of the service invocation. Specifically, access to the execve system call is required for the execution of the service binary — if it is blocked service invocation will necessarily fail

It seems to me that the prctl is kept post-exec to be able to later seccomp execve. But if execve is not expected to be seccomp-ed, prctl could be actually done before the execve, and then working for static binaries. IMHO, that would allow a reduction of a significant attack surface, given prctl possibilities. Am I missing something?

As a side note:

From what I understand, to be able to seccomp execve, one needs to allow some others syscalls, specifically the ones used by the loader and libpostexecseccomp. These syscalls includes, for instance, openat, lseek, mmap, close, ...
In such a case, what would be the expected behavior/use case? Disallowing execve but keeping a lot of likely dangerous syscalls (openat + mmap could almost load an external binary)? (I don't have the answer)

<!-- gh-comment-id:715288769 --> @commial commented on GitHub (Oct 23, 2020): Thanks for your quick answer. From what I understand, the only seccomp "post-exec" syscalls are the one from "@default-keep": > syscalls_in_list(list, "@default-keep", fd, &prelist, &postlist, native); Which resolves to: ``` # etc/templates/syscalls.txt @default-keep=execve,prctl ``` Another way to convince ourselves is to ask `firejail` directly: ```sh $ firejail --noprofile --shell=none --seccomp=$(firejail --debug-syscalls | awk '{print $3;}' | tr '\n' ',') ls Parent pid 9768, child pid 9769 Post-exec seccomp protector enabled Seccomp list in: ... , postlist: execve,prctl ``` We can also write a tiny program to check for `prctl`: ```C printf("secbits = 0x%x => ", prctl(PR_GET_SECUREBITS, 0, 0, 0, 0)); ``` And again: ```sh $ firejail --noprofile --shell=none --seccomp=prctl ./prctl_example Parent pid 10123, child pid 10124 Post-exec seccomp protector enabled Seccomp list in: prctl, check list: @default-keep, postlist: prctl (get killed) $ firejail --noprofile --shell=none --seccomp=prctl ./prctl_example-static Parent pid 10135, child pid 10136 Post-exec seccomp protector enabled Seccomp list in: prctl, check list: @default-keep, postlist: prctl Child process initialized in 21.85 ms secbits = 0x0 => [] Parent is shutting down, bye... ``` So, the problem is for, and only for, `prctl` and `execve`. The custom-loader solution seems a bit overkill to me, and could actually lead to more problems (ELF parsing, etc.). I've tried to look how others solutions circumvent this problem. From what I understand, `systemd` actually disallow (in the sense: "will always fail") seccomp-ing `execve`: (from the man page) > Note that strict system call filters may impact execution and error handling code paths of the service invocation. Specifically, access to the execve system call is required for the execution of the service binary — if it is blocked service invocation will necessarily fail It seems to me that the `prctl` is kept post-exec to be able to later seccomp `execve`. But if `execve` is not expected to be seccomp-ed, `prctl` could be actually done before the `execve`, and then working for static binaries. IMHO, that would allow a reduction of a significant attack surface, given `prctl` possibilities. Am I missing something? As a side note: From what I understand, to be able to seccomp `execve`, one needs to **allow** some others syscalls, specifically the ones used by the loader and `libpostexecseccomp`. These syscalls includes, for instance, `openat`, `lseek`, `mmap`, `close`, ... In such a case, what would be the expected behavior/use case? Disallowing `execve` but keeping a lot of likely dangerous syscalls (`openat` + `mmap` could almost load an external binary)? (I don't have the answer)
Author
Owner

@topimiettinen commented on GitHub (Oct 24, 2020):

I agree that when prctl() needs to be filtered but not execve(), there shouldn't be a need to use libpostexecseccomp.

Filtering open etc indeed breaks a lot of stuff (for example in the dynamic loader before libpostexecseccomp is loaded), so perhaps the list should be more complete. prctl() is needed to install the seccomp filters but it's indeed not the only one.

Systemd and Firejail have different approaches. Systemd is running as PID 1 which is about the most important piece of software in a system besides the kernel, so it's natural that features which could be considered too "hacky" are not very interesting. Firejail instead is in much more flexible position, it's OK if something doesn't work in every case since the feature can be often disabled via per application profiles. In the worst case it's always possible not to use Firejail, but switching PID 1 software (or not using any, init=/bin/sh?) is much more difficult. Blocking execve() with a ld.preload hack would not be OK for PID1, but it's an interesting option for Firejail.

It's of course possible to circumvent blocked execve() with use of other system calls. In the extreme case (fileless malware) attackers don't even need execve() or open() + mmap(), if they only chain enough ROP gadgets to build a simple remote shell or whatever they want to execute.

I think a custom preloader (which wouldn't have to replace the real dynamic loader) could be interesting for other clever uses, after execve() there could be further opportunities for sandboxing. For example, seccomp actions SECCOMP_RET_TRAP and SECCOMP_RET_USER_NOTIF call a function within the thread making the system call, but this could be supplied by the preloader. A custom preloader would be overkill for blocking execve() just for statically linked applications but if it existed, Firejail would be able to install any seccomp filters, even for example SECCOMP_SET_MODE_STRICT which only allows read, write, _exit and sigreturn.

<!-- gh-comment-id:715910550 --> @topimiettinen commented on GitHub (Oct 24, 2020): I agree that when `prctl()` needs to be filtered but not `execve()`, there shouldn't be a need to use `libpostexecseccomp`. Filtering `open` etc indeed breaks a lot of stuff (for example in the dynamic loader before `libpostexecseccomp` is loaded), so perhaps the list should be more complete. `prctl()` is needed to install the seccomp filters but it's indeed not the only one. Systemd and Firejail have different approaches. Systemd is running as PID 1 which is about the most important piece of software in a system besides the kernel, so it's natural that features which could be considered too "hacky" are not very interesting. Firejail instead is in much more flexible position, it's OK if something doesn't work in every case since the feature can be often disabled via per application profiles. In the worst case it's always possible not to use Firejail, but switching PID 1 software (or not using any, `init=/bin/sh`?) is much more difficult. Blocking `execve()` with a ld.preload hack would not be OK for PID1, but it's an interesting option for Firejail. It's of course possible to circumvent blocked `execve()` with use of other system calls. In the extreme case (fileless malware) attackers don't even need `execve()` or `open()` + `mmap()`, if they only chain enough ROP gadgets to build a simple remote shell or whatever they want to execute. I think a custom preloader (which wouldn't have to replace the real dynamic loader) could be interesting for other clever uses, after `execve()` there could be further opportunities for sandboxing. For example, seccomp actions `SECCOMP_RET_TRAP` and `SECCOMP_RET_USER_NOTIF` call a function within the thread making the system call, but this could be supplied by the preloader. A custom preloader would be overkill for blocking `execve()` just for statically linked applications but if it existed, Firejail would be able to install any seccomp filters, even for example `SECCOMP_SET_MODE_STRICT` which only allows `read`, `write`, `_exit` and `sigreturn`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/firejail#2321
No description provided.