On 3/6/26 11:49, Chet Ramey wrote:
I tried for several hours to reproduce your results on Fedora 42
with the
latest devel branch code (otherwise, why bother looking for a fix?)
and I
simply cannot. I mixed the signal delivery order with /usr/bin/kill,
added
a short timeout, and tried several other variants to reproduce the
issue.
The closest I got was sending SIGHUP first, then delaying the
SIGTERM. That
sometimes ran the loop in the exit trap once before the SIGTERM was
delivered (whether running the exit trap after the sleep terminated
or
after the shell received the SIGHUP), but then the SIGTERM was
delivered,
the SIGTERM trap ran, and the shell exited.
Maybe you can help me think through what might be going on in your
test.
My first hypothesis was that there was confusion between the EXIT trap
caused by "sleep" being killed (and therefore interpreted as a failure
under
"set -e") and the TERM trap. However, it seems the execution of the
EXIT trap is always due to HUP being received, which I was not paying
sufficient attention to. I have rerun several scenarios that produce
consistent results (repeated 10 times with same output). The timeout
is the value passed to /usr/bin/kill before sending SIGHUP, the main
signal being SIGTERM (as should be the case in a reboot). I have
added a HUP trap playing the same role as the TERM trap (output +
re-kill), and tested various configurations. The results in the
table tell what was the outcome.
current bash (5.1.8, RHEL 9) (with or without set -e)
timeout with-HUP-trap without-HUP-trap
0 kill -HUP (1) kill -TERM followed by EXIT trap (!)
1 kill -TERM kill -TERM
(!) this is what triggered this email thread, the re-kill with SIGTERM
not stopping the script which then completes the entire EXIT trap:
+ sleep infinity
+++ termtrap
+++ trap - EXIT HUP TERM
+++ echo termtrap
termtrap
+++ return 0
+++ kill -TERM 1318803
++ exittrap # this and the following complete EXIT trap execution is
unexpected
...
(1) with following output:
foo.sh: line 30: warning: run_pending_traps: bad value in
trap_list[15]: (nil)
foo.sh: line 30: warning: run_pending_traps: signal handler is SIG_DFL,
resending 15 (SIGTERM) to myself
devel bash (CFLAGS=-DDEBUG) (with or without set -e, but see notes)
timeout with-HUP-trap without-HUP-trap
0 kill -HUP kill -TERM (3)
1 kill -HUP (2) EXIT trap (4)
(2) with following output:
+ sleep infinity
Terminated sleep infinity
++ termtrap # this line only without set -e
foo.sh: line 20: DEBUG warning: run_pending_traps: recursive invocation
while running trap for signal 15
+++ huptrap
(3) with following output:
+ sleep infinity
foo.sh: DEBUG warning: run_pending_traps: recursive invocation while
running trap for signal 0
+++ termtrap
(4) with following output:
+ sleep infinity
Terminated sleep infinity
++ termtrap # this line only without set -e
+++ exittrap
In all cases, if we ignore HUP in the script, all tests end with "kill
-TERM", cleanly.
My conclusion: without handling HUP, a reboot creates a race between
HUP and TERM and
the results are unpredictable. With a trap for both HUP and TERM,
these take
precedence over the EXIT trap, but we cannot be sure which will be
executed.
Ignoring the HUP signal seems to provide the cleanest execution under
reboot,
preventing getting a race condition in the shell.
It's still unclear how to classify the fact that the re-kill with
SIGTERM
under bash 5.1.8 did not stop the script and continued with the EXIT
trap.
That might just fall under some "undefined behavior" due to race
conditions between signals. The devel version of bash does not show
that "anomaly" but can still execute the EXIT trap even after receiving
the TERM signal.
Thanks for your analysis and questions. I was about to add
"exit(128+n)"
after all of these re-kill(self, n), not trusting these would stop the
process. It looks like ignoring HUP is a better solution.
Regards,
--
Daniel Villeneuve
PS: the final script, for reference:
function exittrap() {
trap - EXIT HUP TERM
echo "exittrap"
i=0
while (( ++i <= 2 )); do
echo "cleanup $i"
sleep 1
done
return 0
}
function huptrap() {
trap - EXIT HUP TERM
echo "huptrap"
return 0
}
function termtrap() {
trap - EXIT HUP TERM
echo "termtrap"
return 0
}
set -x
trap 'exittrap' EXIT
trap 'huptrap; kill -HUP $$' HUP # comment line for no trap, use '' to
ignore
trap 'termtrap; kill -TERM $$' TERM
sleep infinity