2015-09-19 21:28:24 -0400, Chet Ramey: > On 9/19/15 5:31 PM, Stephane Chazelas wrote: > > 2015-09-19 16:42:28 -0400, Chet Ramey: > > [...] > >> I'm surprised you've managed to avoid the dozen or so discussions on the > >> topic. > >> > >> http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html > > [...] > > > > Thanks for the links. I still think the comments on the second > > article I sent > > (http://thread.gmane.org/gmane.comp.shells.bash.bugs/24178/focus=24183) > > still hold though and from a quick read I don't see those points > > being mentioned in the past discussions (but that was a quick > > read). > > > > I notice that you mention the race conditions have been fixed, > > but I'm still seeing some non-deterministic behaviour. > > I can't reproduce this on Mac OS X and RHEL 6 and 7, the systems I have > readily available today. > > The shell notes when it sees SIGINT and whether or not waitpid returns > -1/EINTR. If the sleep exits due to SIGINT, even after the waitpid > returns -1, the shell assumes it didn't catch and handle the SIGINT and > the shell calls the trap handler. [...]
To clarify, In bash -c 'sh -c "trap exit INT; sleep 99; :"; echo hi' The command under test is "bash", not "sh". The "sh" is just there as a cmd that does exit() upon receiving SIGINT. It's just: bash -c 'cmd; echo hi' You can replace "cmd" with: perl -e '$SIG{INT}= sub{exit}; sleep' (or mksh -c 'sleep 10; :' (which does an exit(130) upon receiving SIGINT)) The problem here is that when you press CTRL-C, SIGINT is sent to all the processes in the process group, so to "bash" and "cmd". Now, bash works as expected only if it handles its own SIGINT before the child has caught its own one and exited. When the above code exits without printing "hi", we see this call stack for instance (breakpoint on kill() in gdb): #0 kill () at ../sysdeps/unix/syscall-template.S:81 #1 0x000000000045dd8e in termsig_handler (sig=<optimized out>) at sig.c:588 #2 0x000000000045ddef in termsig_handler (sig=<optimized out>) at sig.c:554 #3 0x00000000004466bb in set_job_status_and_cleanup (job=0) at jobs.c:3539 #4 waitchld (block=block@entry=1, wpid=20802) at jobs.c:3316 #5 0x000000000044733b in wait_for (pid=20802) at jobs.c:2485 #6 0x0000000000437992 in execute_command_internal (command=command@entry=0x70aa48, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bb68) at execute_cmd.c:829 #7 0x0000000000437b0e in execute_command (command=0x70aa48) at execute_cmd.c:390 #8 0x0000000000435f23 in execute_connection (fds_to_close=0x70bb48, pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bb08) at execute_cmd.c:2494 #9 execute_command_internal (command=0x70bb08, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bb48) at execute_cmd.c:945 #10 0x000000000047955b in parse_and_execute (string=<optimized out>, from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at evalstring.c:387 #11 0x00000000004205d7 in run_one_command (command=<optimized out>) at shell.c:1348 #12 0x000000000041f524 in main (argc=3, argv=0x7fffffffe198, env=0x7fffffffe1b8) at shell.c:695 That is, SIGINT is being handled *after* the SIGINT handler has been restored to its default of exiting the shell. Now, I'm not sure how to best fix that as I suppose we don't get any guarantee of when SIGINT will be delivered (it may be why ksh93 ignores SIGINT altogether and relies solely on WIFSIGNALED) The above scenario suggests SIGCHLD is being delivered before SIGINT which is strange. I'd expect SIGINT to be inserted by the kernel in both cmd and bash queues upon CTRL-C, and the SIGCHLD would necesarily come after those SIGINT. Could it be that SIGCHLD jumps the queue? Note that I'm not seeing that as often on every system. It seems I can make it more likely by making the system busier. -- Stephane