Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -O2 -flto=auto -ffat-lto-objects -fexceptions -g 
-grecord-gcc-switches -pipe -Wall -Werror=format-security 
-Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64 
-mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection 
-fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer 
-mno-omit-leaf-frame-pointer -std=gnu17
uname output: Linux glimmerlight 6.19.12-200.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC 
Sun Apr 12 15:26:33 UTC 2026 x86_64 GNU/Linux
Machine Type: x86_64-redhat-linux-gnu

Bash Version: 5.3
Patch Level: 0
Release Status: release

Description:
Hello! Recently I was debugging a very strange fail in our systemd test suite, 
and after some digging I noticed it was
caused by a bash script that was being used as a stub init for a test 
container. To test if the signal delivery to the
container's init works properly, the stub init has several traps defined for 
various signals, which are then probed and
checked. However, in a couple of cases the stub init entered a busy loop and 
trashed the machine. A minimized version
of the stub init can be found in the Repeat-by section below.

As for what I think is happening in the stub init/reproducer: let's have traps 
defined for SIGNAL1, SIGINT, and SIGNAL2
where the "signal number" of SIGNAL1 > SIGNAL2. The trap handler for SIGNAL1 is 
calling an external command
(non-builtin). Now, if you send such script a sequence of signals SIGNAL1, 
SIGINT, SIGNAL2, following may happen:
   - wait is waiting in waitpid, SIGNAL1 arrives, catch_flag is set to 1 via 
set_trap_state(), wait returns 128+SIGNAL1
   - bash loops over pending_traps, arrives at SIGNAL1, and executes (forks) 
the external command from its trap handler
   - wait_for() sets SIGINT handler to wait_sigint_handler()
   - SIGINT arrives, wait_sigint_handler() runs (so no set_trap_state() here), 
wait_sigint_received is set to 1
   - SIGNAL2 arrives, set_trap_state() sets catch_flag to 1 and 
pending_traps[SIGNAL2] to 1
   - the waited for external command exits, set_job_status_and_cleanup() 
notices that
     wait_sigint_received && child_caught_sigint && signal_is_trapped(SIGINT) 
holds true and runs run_interrupt_trap()
   - (!!) run_interrup_trap() unconditionally resets catch_flag to 0 even 
though we still have a pending signal (SIGNAL2)
   - control returns back to run_pending_traps(), but since SIGNAL1 > SIGNAL2, 
we're already past SIGNAL2 in the loop,
     so the handling of this signal gets skipped and the function returns
   - control goes back to the main loop where wait is called again - 
first_pending_signal() finds SIGNAL2 in the
     pending_traps array, so wait returns 128+SIGNAL2
   - run_pending_traps() is called again, but this time catch_flags is 0 - the 
function returns immediately
   - and now the busy-loop continues: control goes back to the main loop, wait 
is called again, first_pending_signal()
     finds SIGNAL2, wait returns 128+SIGNAL2, run_pending_traps() finds that 
catch_flag is 0 and immediately returns
   - ...

I also tried this with the latest git version (commit 
637f5c8696a6adc9b4519f1cd74aa78492266b7f ATTOW) compiled in a
Fedora Rawhide VM and the behaviour is the same.

Repeat-By:
$ cat repro.sh
#!/bin/bash

set -x

: "Leader: $$"

PID=0

trap 'echo RTMIN+3' RTMIN+3
trap 'echo RTMIN+4; sleep 1' RTMIN+4
trap 'echo INT' INT

sleep infinity &
PID=$!

while :; do
     wait "$PID" || :
done

## In one terminal
$ ./test.sh |& tee log.txt

## In a second one
$ PID=1801028; kill -RTMIN+4 $PID; sleep .1; kill -INT $PID; sleep .1; kill 
-RTMIN+3 $PID

## This should cause the test.sh script to end up in a busy-loop:
$ head -n 30 log.txt
+ : 'Leader: 1801028'
+ PID=0
+ trap 'echo RTMIN+3' RTMIN+3
+ trap 'echo RTMIN+4; sleep 1' RTMIN+4
+ trap 'echo INT' INT
+ PID=1801030
+ :
+ wait 1801030
+ sleep infinity
++ echo RTMIN+4
RTMIN+4
++ sleep 1
+++ echo INT
INT
+ :
+ :
+ wait 1801030
+ :
+ :
+ wait 1801030
+ :
+ :
...

Fix:
A naive solution would be to reset the catch_flag in run_interrupt_trap() only 
if there's no pending signal, i.e.:

diff --git a/trap.c b/trap.c
index 5cf8a1bd..dc806822 100644
--- a/trap.c
+++ b/trap.c
@@ -1351,7 +1351,8 @@ run_interrupt_trap (int will_throw)
    if (will_throw && running_trap > 0)
      run_trap_cleanup (running_trap - 1);
    pending_traps[SIGINT] = 0;   /* run_pending_traps does this */
-  catch_flag = 0;
+  if (first_pending_trap () == -1)
+    catch_flag = 0;
    _run_trap_internal (SIGINT, "interrupt trap");
  }

Which does seem to help in this case, but I'm not sure about any potential side 
effects this change could have.


Reply via email to