On 3/6/26 11:49, Chet Ramey wrote:

     I tried for several hours to reproduce your results on Fedora 42
     with the
     latest devel branch code (otherwise, why bother looking for a fix?)
     and I
     simply cannot. I mixed the signal delivery order with /usr/bin/kill,
     added
     a short timeout, and tried several other variants to reproduce the
     issue.
     The closest I got was sending SIGHUP first, then delaying the
     SIGTERM. That
     sometimes ran the loop in the exit trap once before the SIGTERM was
     delivered (whether running the exit trap after the sleep terminated
     or
     after the shell received the SIGHUP), but then the SIGTERM was
     delivered,
     the SIGTERM trap ran, and the shell exited.

     Maybe you can help me think through what might be going on in your
     test.

   My first hypothesis was that there was confusion between the EXIT trap
   caused by "sleep" being killed (and therefore interpreted as a failure
   under
   "set -e") and the TERM trap.  However, it seems the execution of the
   EXIT trap is always due to HUP being received, which I was not paying
   sufficient attention to.  I have rerun several scenarios that produce
   consistent results (repeated 10 times with same output).  The timeout
   is the value passed to /usr/bin/kill before sending SIGHUP, the main
   signal being SIGTERM (as should be the case in a reboot). I have
   added a HUP trap playing the same role as the TERM trap (output +
   re-kill), and tested various configurations.  The results in the
   table tell what was the outcome.
   current bash (5.1.8, RHEL 9) (with or without set -e)
   timeout    with-HUP-trap    without-HUP-trap
   0          kill -HUP (1)    kill -TERM followed by EXIT trap (!)
   1          kill -TERM       kill -TERM
   (!) this is what triggered this email thread, the re-kill with SIGTERM
   not stopping the script which then completes the entire EXIT trap:
   + sleep infinity
   +++ termtrap
   +++ trap - EXIT HUP TERM
   +++ echo termtrap
   termtrap
   +++ return 0
   +++ kill -TERM 1318803
   ++ exittrap # this and the following complete EXIT trap execution is
   unexpected
   ...
   (1) with following output:
   foo.sh: line 30: warning: run_pending_traps: bad value in
   trap_list[15]: (nil)
   foo.sh: line 30: warning: run_pending_traps: signal handler is SIG_DFL,
   resending 15 (SIGTERM) to myself
   devel bash (CFLAGS=-DDEBUG) (with or without set -e, but see notes)
   timeout    with-HUP-trap    without-HUP-trap
   0          kill -HUP        kill -TERM (3)
   1          kill -HUP (2)    EXIT trap (4)
   (2) with following output:
   + sleep infinity
   Terminated                 sleep infinity
   ++ termtrap # this line only without set -e
   foo.sh: line 20: DEBUG warning: run_pending_traps: recursive invocation
   while running trap for signal 15
   +++ huptrap
   (3) with following output:
   + sleep infinity
   foo.sh: DEBUG warning: run_pending_traps: recursive invocation while
   running trap for signal 0
   +++ termtrap
   (4) with following output:
   + sleep infinity
   Terminated                 sleep infinity
   ++ termtrap # this line only without set -e
   +++ exittrap
   In all cases, if we ignore HUP in the script, all tests end with "kill
   -TERM", cleanly.
   My conclusion: without handling HUP, a reboot creates a race between
   HUP and TERM and
   the results are unpredictable.  With a trap for both HUP and TERM,
   these take
   precedence over the EXIT trap, but we cannot be sure which will be
   executed.
   Ignoring the HUP signal seems to provide the cleanest execution under
   reboot,
   preventing getting a race condition in the shell.
   It's still unclear how to classify the fact that the re-kill with
   SIGTERM
   under bash 5.1.8 did not stop the script and continued with the EXIT
   trap.
   That might just fall under some "undefined behavior" due to race
   conditions between signals.  The devel version of bash does not show
   that "anomaly" but can still execute the EXIT trap even after receiving
   the TERM signal.
   Thanks for your analysis and questions.  I was about to add
   "exit(128+n)"
   after all of these re-kill(self, n), not trusting these would stop the
   process. It looks like ignoring HUP is a better solution.
   Regards,
   --
   Daniel Villeneuve
   PS: the final script, for reference:
   function exittrap() {
     trap - EXIT HUP TERM
     echo "exittrap"
     i=0
     while (( ++i <= 2 )); do
       echo "cleanup $i"
       sleep 1
     done
     return 0
   }
   function huptrap() {
     trap - EXIT HUP TERM
     echo "huptrap"
     return 0
   }
   function termtrap() {
     trap - EXIT HUP TERM
     echo "termtrap"
     return 0
   }
   set -x
   trap 'exittrap' EXIT
   trap 'huptrap; kill -HUP $$' HUP # comment line for no trap, use '' to
   ignore
   trap 'termtrap; kill -TERM $$' TERM
   sleep infinity

Reply via email to