mirror of
https://github.com/netblue30/firejail.git
synced 2026-05-15 14:16:14 -06:00
[GH-ISSUE #6729] Deadlock on /run/firejail/firejail-run.lock when a firejailed process is suspended #3348
Labels
No labels
LTS merge
LTS merge
bug
bug
converted-to-discussion
doc-todo
documentation
duplicate
enhancement
file-transfer
firecfg
firejail-in-firejail
firetools
graphics
help wanted
information_old
installation
invalid
modif
moved
needinfo
networking
notabug
notourbug
old-version
overlayfs
packaging
profile-request
pull-request
question
question_old
removal
runtime-permissions
sandbox-ipc
security
stale
wiki
wiki
wontfix
wordpress
workaround
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/firejail#3348
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @hlein on GitHub (May 1, 2025).
Original GitHub issue: https://github.com/netblue30/firejail/issues/6729
Description
If a firejail'ed process is
^Z'd at the wrong time during startup, other firejail processes will be unable to proceed because they'll wait forever for a flock on/run/firejail/firejail-run.lockSteps to Reproduce
maketarget thatssh'ed to a remote host via a SOCKS proxy listener that was dead.makein my case)Note, this is racy (interrupting between lock and unlock) so I don't expect it to be reproducible every time.
Expected behavior
Separate/independent firejails should proceed, or if not, perhaps output an error (upon timeout?).
Actual behavior
Any new firejail process hangs indefinitely at startup.
Behavior without a profile
I think as long as we grab a lock on
/run/firejail/firejail-run.lock, and allowSIGSTPto interrupt while we hold the lock, we are susceptible.Additional context
Maybe we can block
SIGSTPin the time betweenpreproc_lock_firejail_dir()andpreproc_unlock_firejail_dir()? And/or useLOCK_NBand busy-wait a reasonable amount of time so we can abort.Environment
uname -srm): Linux 6.6.83-gentoo x86_64mesa 1:24.3.3-2"): openssh-10.0_p1 (but I think it's not dependent on that)
firejail --version): 0.9.74was compiled (
git rev-parse HEAD):Checklist
/usr/bin/vlc) "fixes" it).https://github.com/netblue30/firejail/issues/1139)browser-allow-drm yes/browser-disable-u2f noinfirejail.configto allow DRM/U2F in browsers.--profile=PROFILENAMEto set the right profile. (Only relevant for AppImages)@kmk3 commented on GitHub (May 2, 2025):
Makes sense to me.
@spiiroin Any thoughts on this?
Relates to:
@spiiroin commented on GitHub (May 5, 2025):
@kmk3 I guess the worst part is that should somebody be bitten by this deadlock, it probably is not that easy to figure out what is going on. As the lock is supposed to be held only very briefly, I see no harm with blocking stop for that duration. And non-blocking lock attempts in a loop would allow sensible diagnostic output & bailout in case of failure like this (or something similar). At this stage the program is still starting up, so there is not too much to lose yet etc
@kmk3 commented on GitHub (May 7, 2025):
@spiiroin
Agreed, thanks for the assessment.
@hlein
Would you like to submit a PR for this?
@hlein commented on GitHub (May 7, 2025):
I would love to but my C is a burning trash fire. A very slow burning trash fire!
There's a chance I can assign someone to work on it in the next couple of weeks tho.
@kmk3 commented on GitHub (May 8, 2025):
This seemed straightforward enough to implement, so I made an attempt with
sigprocmask, but apparently it is not possible to mask/ignoreSIGSTOP:From GNU libc.info (glibc 2.41):
As an alternative, I thought about maybe forking into a background process just
to create the directories and then waiting on it in the main process (assuming
that background processes are not affected by ctrl+z).
Though that seems to also require
exec, in which case the code in thecritical section might have to be isolated into a specific function.
Also, note that for some lock/unlock calls, the code in between them is spread
over almost 300 lines in src/firejail/main.c, so some careful refactoring might
be needed in order to reduce that:
Thoughts on this or any other ideas?
@hlein commented on GitHub (May 8, 2025):
Ah, I think it won't be that hard: ^Z delivers
SIGSTP, notSIGSTOP.SIGSTPis what is propagated by^Zin a shell, and it can be masked/ignored/caught.At least, that's how I remember it. And a simple ^Z test of an strace'd process:
...This does mean that ignoring
SIGSTPwhile the lock is held doesn't completely solve the problem - hypothetically someone couldkill -STOPat just the wrong time. Although I think only root could do so, because during lockfile handling we are running as root. AFAIKSIGSTPcan be propagated to a process group even if there's a privilege boundary crossed, butSIGSTOPcannot. Still, you could imagine other things going wrong while the lock is held, which is why nonblocking-busy-loop lock acquisition may still be a good idea.Good point about examining the codepaths between lock with
preproc_lock_fileand unlock withpreproc_unlock_file.Also, interesting, there's both
preproc_{,un}lock_firejail_dirandpreproc_{,un}lock_firejail_network_dir... and in one codepath ofmain()we take one lock while the other is already held:I didn't look further than that yet but it may mean we need to be prepared to keep track smartly in order to restore previous signal handlers/behaviors through nested locks. Unless that can be turned into
lock_network/unlock_network+lock/unlock.@kmk3 commented on GitHub (May 9, 2025):
Ah I see it now; it should be fine to use
sigprocmaskthen.The name was written multiple times as
SIGSTP(without the T) rather thanSIGTSTP. I couldn't find a reference to it, so I assumed that either it wasmeant as
SIGSTOPor that it was just an alternative/archaic name forSIGSTOP.How long do you think the attempt interval should be?
Firejail seems to take anywhere from ~10ms (with
--noprofile) to over 1000ms(depending on the profile) to configure the sandbox and start the program.
Any idea how long it usually takes from the first lock to the last unlock?
Indeed, I noticed that as well; not sure if it's a problem.
I originally did it in a few lines by masking/unmasking
SIGTSTPinside thelock/unlock functions, though unconditionally unmasking it might be undesirable
if
SIGTSTPwas already masked prior to the locking. Storing the signal setin a global variable could potentially work, but it feels like it could lead to
issues down the line if more concurrency code is introduced.
So maybe the most obvious way would be to mask
SIGTSTPbefore calling thefirst lock and restore the original mask after the last unlock.
By the way, there is already one place using
sigprocmaskin the code:@hlein commented on GitHub (May 9, 2025):
Oh you're right, I can't type,
s/SIGSTP/SIGTSTP/g!I don't really know how long to expect the locking to take - we don't keep the lock for full startup, but only for some specific things. Those "specific things" probably vary by profile, but not necessarily proportional to the complexity of the app.
So Claude and I whipped up a little test harness, and I ran a few apps through it.
tl;dr: longest I ever see us holding a lock is under 8 milliseconds (on this box anyway). So we could probably busy-loop for like 20ms, then start backing off until we get to 0.5 sec or so before aborting?
Then I bumped up to 100 iterations, and ran 4 ls in parallel to potentially cause some contention; the worst of each set:
Removed the cooldown, expected to get worse results from more contention, but actually no:
Running
muttgot slightly higher maxes but still below 8 ms:Here's the script:
lock_timing.pl.txt
@kmk3 commented on GitHub (May 12, 2025):
Very nice, thanks for the measurements!
Sounds good.
By the way, I noticed that there is already some locking + signal handling
happening on src/firejail/main.c (
set_sandbox_run_file+install_handler):Though this lock is for the instance-specific directory in
/run/firejail/profile/<PID>(using the child process PID) and the signalhandling seems to be intended just to ensure that the child process is
terminated if the parent process receives
SIGINT/SIGTERM:So maybe mask
SIGTSTPbeforepreproc_lock_firejail_dirand restore it afterrelease_sandbox_lock?Though there is one potential
execvpbetweenpreproc_unlock_firejail_network_dirandrelease_sandbox_lock, so maybe itwould be better to restore the mask right after
preproc_unlock_firejail_network_dir.There is also code that creates and handles child processes; not sure if the
signal masking would be an issue there.
I'm currently working on a few other things; it would be great to have more
people working on improving the main code.
@hlein commented on GitHub (May 14, 2025):
fyi @kmk3, @jlimor-kl has been working on this and will probably have a PR shortly.
@jlimor-kl commented on GitHub (May 15, 2025):
I have also written a first draft of the non-blocking lock + busy wait backoff for flock, would you prefer I fold this into the existing PR or wait until that one has been merged to submit a second PR?
@kmk3 commented on GitHub (May 16, 2025):
To simplify the review, I think it would be better to wait for the existing PR,
as the underlying bug is trickier than usual (since it involves a race
condition) and these are 2 related but independent logical changes.