Hello, Periodically we see bash processes which run in busy loops: $ strace -fp 15264 strace: Process 15264 attached --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} --- rt_sigreturn({mask=[QUIT]}) = 143 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} --- rt_sigreturn({mask=[QUIT]}) = 143 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} --- rt_sigreturn({mask=[QUIT]}) = 143 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} --- rt_sigreturn({mask=[QUIT]}) = 143
$ gdb -p 15264 Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) si termsig_sighandler (sig=11) at sig.c:488 488 sig != SIGHUP && (gdb) s 486 if ( (gdb) 523 sig == terminating_signal) (gdb) 521 sig != SIGUSR2 && (gdb) 526 terminating_signal = sig; (gdb) 534 if (interactive_shell == 0 || interactive == 0 || (sig != SIGHUP && sig != SIGTERM) || no_line_editing || (RL_ISSTATE (RL_STATE_READCMD) == 0)) (gdb) 536 history_lines_this_session = 0; (gdb) Breakpoint 1, termsig_sighandler (sig=11) at sig.c:539 539 termsig_handler (sig); (gdb) termsig_handler (sig=<optimized out>) at sig.c:564 564 if (handling_termsig) (gdb) termsig_sighandler (sig=11) at sig.c:538 538 terminate_immediately = 0; (gdb) 539 termsig_handler (sig); (gdb) termsig_handler (sig=11) at sig.c:564 564 if (handling_termsig) (gdb) termsig_sighandler (sig=11) at sig.c:548 548 if (RL_ISSTATE (RL_STATE_SIGHANDLER) || RL_ISSTATE (RL_STATE_TERMPREPPED)) (gdb) Warning: Cannot insert breakpoint 0. Cannot access memory at address 0x5b0000006e 0x0000000000000000 in ?? () I decided to investigate this problem and found that bash handles all signals, and if it decides that a signal should be fatal, it sets a default signal handler and sends the same signal to itself. Unfortunately this doesn't work in a case, when a bash process is an init process in its pid namespace, because an init process ignores all signals which are sent from the current namespace. It can be easy reproduced: [avagin@laptop issue]$ cat run.sh #!/bin/bash set -e -m unshare -fp sh -c 'exec -a init1234 bash init.sh' & sleep 3 pid=`pidof init1234` kill $pid sleep 3 kill -9 $pid [avagin@laptop issue]$ [avagin@laptop issue]$ cat init.sh #!/bin/bash function finish { echo Exit trap } trap finish EXIT while :; do sleep 1; echo "Alive" done [avagin@laptop issue]$ unshare -Ur bash -x run.sh + set -e -m + sleep 3 + unshare -fp sh -c 'exec -a init1234 bash init.sh' Alive Alive ++ pidof init1234 + pid=30916 + kill 30916 + sleep 3 Exit trap Alive Alive Alive + kill -9 30916 You can see that the process continues working after the exit hook. It is obviously a bug. Another bad thing here is that termsig_handler() works only once: void termsig_handler (sig) int sig; { static int handling_termsig = 0; /* Simple semaphore to keep this function from being executed multiple times. Since we no longer are running as a signal handler, we don't block multiple occurrences of the terminating signals while running. */ if (handling_termsig) goto out; handling_termsig = 1; terminating_signal = 0; /* keep macro from re-testing true. */ So if we kill a test process by SIGTERM, it sets a default handler for SIGTERM and sends SIGTERM to itself again. As the process is the init process in its pidns, the signal is ignored, and the process continues running. Then the process triggers SIGSEGV (e.g. deference unmapped memory), the kernel sends SEGV, bash executes its signal handler (termsig_sighandler), but it doesn't try to set a default signal handler, because termsig_handler() was already executed once, so the process returns back from a signal handler and triggers SIGSEGV again and so on. I am thinking how to fix this issue properly. Here are a few points: * bash should know that signals are ignored if a process is the init process in a pid namespace. * user and kernel signals (SEGV, SIGFPE, SIGBUS, etc) are handled differently * bash should not return back from termsig_sighandler(), if it has sent a signal to itself. Do you have any ideas, advices, comments about this problem? Thanks, Andrei