Re: SIGINT handling
On 9/19/15 5:31 PM, Stephane Chazelas wrote: > In case it was caused by some Debian patch, I recompiled the > code of 4.3.42 from gnu.org and the one from the devel branch on > the git repository (commit bash-20150911 snapshot) and still: > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > ^Chi > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > ^Chi > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > ^C > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > ^Chi > > Sometimes (and the frequency of occurrences is erratic, > generally roughly 80% of "hi"s but at times, I don't see a "hi" > in a while), the "hi" doesn't show up. Note that I press ^C well > after sleep has started. It would be nice to see a system call trace for this so we can check what's going on with the timing. Can you reproduce this on anything other than Debian? I'm wondering whether it's a Linux-4 kernel phenomenon. Plus I don't have any Debian machines laying around. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
On 9/20/15 11:52 AM, Stephane Chazelas wrote: > When the above code exits without printing "hi", we see this > call stack for instance (breakpoint on kill() in gdb): > > #0 kill () at ../sysdeps/unix/syscall-template.S:81 > #1 0x0045dd8e in termsig_handler (sig=) at sig.c:588 > #2 0x0045ddef in termsig_handler (sig=) at sig.c:554 > #3 0x004466bb in set_job_status_and_cleanup (job=0) at jobs.c:3539 > #4 waitchld (block=block@entry=1, wpid=20802) at jobs.c:3316 > #5 0x0044733b in wait_for (pid=20802) at jobs.c:2485 > #6 0x00437992 in execute_command_internal > (command=command@entry=0x70aa48, asynchronous=asynchronous@entry=0, > pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, > fds_to_close=fds_to_close@entry=0x70bb68) at execute_cmd.c:829 > #7 0x00437b0e in execute_command (command=0x70aa48) at > execute_cmd.c:390 > #8 0x00435f23 in execute_connection (fds_to_close=0x70bb48, > pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bb08) at > execute_cmd.c:2494 > #9 execute_command_internal (command=0x70bb08, > asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, > pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bb48) > at execute_cmd.c:945 > #10 0x0047955b in parse_and_execute (string=, > from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at > evalstring.c:387 > #11 0x004205d7 in run_one_command (command=) at > shell.c:1348 > #12 0x0041f524 in main (argc=3, argv=0x7fffe198, > env=0x7fffe1b8) at shell.c:695 > > That is, SIGINT is being handled *after* the SIGINT handler has > been restored to its default of exiting the shell. An alternate explanation is that somehow the shell is forgetting that SIGINT is trapped. I don't see how or why that would happen, but I don't have enough information to determine whether that's the case. > Now, I'm not sure how to best fix that as I suppose we don't get > any guarantee of when SIGINT will be delivered (it may be why > ksh93 ignores SIGINT altogether and relies solely on > WIFSIGNALED) > > The above scenario suggests SIGCHLD is being delivered before > SIGINT which is strange. I'd expect SIGINT to be inserted by the > kernel in both cmd and bash queues upon CTRL-C, and the SIGCHLD > would necesarily come after those SIGINT. Could it be that > SIGCHLD jumps the queue? The above scenario doesn't suggest that SIGCHLD is being delivered at all. The shell is doing a blocking waitpid for a specific pid. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
2015-09-23 21:27:00 -0400, Chet Ramey: > On 9/19/15 5:31 PM, Stephane Chazelas wrote: > > > In case it was caused by some Debian patch, I recompiled the > > code of 4.3.42 from gnu.org and the one from the devel branch on > > the git repository (commit bash-20150911 snapshot) and still: > > > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > ^Chi > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > ^Chi > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > ^C > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > ^Chi > > > > Sometimes (and the frequency of occurrences is erratic, > > generally roughly 80% of "hi"s but at times, I don't see a "hi" > > in a while), the "hi" doesn't show up. Note that I press ^C well > > after sleep has started. > > It would be nice to see a system call trace for this so we can check > what's going on with the timing. I don't have them logged but I did several tests in gdb with "handle SIGINT nostop pass" and as I said before, Upon the test that sets child_caught_sigint, waitpid() has not returned with EINTR and wait_sigint_received has been set. If I break on the SIGINT handler, I see the call trace at the return of the "syscall". I can try and get you a call trace later today. > > Can you reproduce this on anything other than Debian? I'm wondering > whether it's a Linux-4 kernel phenomenon. Plus I don't have any > Debian machines laying around. It's hard to reproduce on an idle system. It's relatively easy to reproduce on a busy one and if the "cmd" exits shortly after receiving its SIGINT. I can reproduce on a Ubuntu 14.04 with an older kernel (3.13). I can't reproduce on FreeBSD (in a VM though). cmd == #include main() {signal(2,_exit);pause();} $ tar zcf - / >& /dev/null & [1] 4417 $ tar zcf - / >& /dev/null & [2] 4419 $ tar zcf - / >& /dev/null & [3] 4421 $ bash -c './a.out; echo x' ^Cx $ bash -c './a.out; echo x' ^C Works on second attempt. -- Stephane
Re: SIGINT handling
2015-09-24 07:01:23 +0100, Stephane Chazelas: > 2015-09-23 21:27:00 -0400, Chet Ramey: > > On 9/19/15 5:31 PM, Stephane Chazelas wrote: > > > > > In case it was caused by some Debian patch, I recompiled the > > > code of 4.3.42 from gnu.org and the one from the devel branch on > > > the git repository (commit bash-20150911 snapshot) and still: > > > > > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > > ^Chi > > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > > ^Chi > > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > > ^C > > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > > ^Chi > > > > > > Sometimes (and the frequency of occurrences is erratic, > > > generally roughly 80% of "hi"s but at times, I don't see a "hi" > > > in a while), the "hi" doesn't show up. Note that I press ^C well > > > after sleep has started. > > > > It would be nice to see a system call trace for this so we can check > > what's going on with the timing. > > I don't have them logged but I did several tests in gdb > with "handle SIGINT nostop pass" and as I said before, > Upon the test that sets child_caught_sigint, waitpid() has not > returned with EINTR and wait_sigint_received has been set. > > If I break on the SIGINT handler, I see the call trace at the > return of the "syscall". > > I can try and get you a call trace later today. [...] (gdb) handle SIGINT nostop pass SIGINT is used by the debugger. Are you sure you want to change it? (y or n) y SignalStop Print Pass to program Description SIGINTNoYes Yes Interrupt (gdb) break wait_sigint_handler Breakpoint 1 at 0x443a70: file jobs.c, line 2241. (gdb) run Starting program: bash-4.3/bash -c ./a\;\ echo\ x ^C Program received signal SIGINT, Interrupt. Breakpoint 1, wait_sigint_handler (sig=2) at jobs.c:2241 2241{ (gdb) bt #0 wait_sigint_handler (sig=2) at jobs.c:2241 #1 #2 0x776bc31c in __libc_waitpid (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31 #3 0x00445f3d in waitchld (block=block@entry=1, wpid=5337) at jobs.c:3224 #4 0x0044733b in wait_for (pid=5337) at jobs.c:2485 #5 0x00437992 in execute_command_internal (command=command@entry=0x70bb88, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bde8) at execute_cmd.c:829 #6 0x00437b0e in execute_command (command=0x70bb88) at execute_cmd.c:390 #7 0x00435f23 in execute_connection (fds_to_close=0x70bdc8, pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bd88) at execute_cmd.c:2494 #8 execute_command_internal (command=0x70bd88, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bdc8) at execute_cmd.c:945 #9 0x0047955b in parse_and_execute (string=, from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at evalstring.c:387 #10 0x004205d7 in run_one_command (command=) at shell.c:1348 #11 0x0041f524 in main (argc=3, argv=0x7fffe258, env=0x7fffe278) at shell.c:695 (gdb) frame 2 #2 0x776bc31c in __libc_waitpid (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31 31 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory. (gdb) disassemble Dump of assembler code for function __libc_waitpid: 0x776bc300 <+0>: mov0x2f14cd(%rip),%r9d# 0x779ad7d4 <__libc_multiple_threads> 0x776bc307 <+7>: test %r9d,%r9d 0x776bc30a <+10>:jne0x776bc336 <__libc_waitpid+54> 0x776bc30c <+12>:xor%r10d,%r10d 0x776bc30f <+15>:movslq %edx,%rdx 0x776bc312 <+18>:movslq %edi,%rdi 0x776bc315 <+21>:mov$0x3d,%eax 0x776bc31a <+26>:syscall => 0x776bc31c <+28>:cmp$0xf000,%rax 0x776bc322 <+34>:ja 0x776bc325 <__libc_waitpid+37> 0x776bc324 <+36>:retq 0x776bc325 <+37>:mov0x2ebb3c(%rip),%rdx# 0x779a7e68 0x776bc32c <+44>:neg%eax 0x776bc32e <+46>:mov%eax,%fs:(%rdx) 0x776bc331 <+49>:or $0x,%rax (gdb) fin Run till exit from #2 0x776bc31c in __libc_waitpid (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31 0x00445f3d in waitchld (block=block@entry=1, wpid=5481) at jobs.c:3224 3224 pid = WAITPID (-1, &status, waitpid_flags); Value returned is $5 = 5337 (gdb) p wait_sigint_received $6 = 1 In the other (working) cases, the difference is that waitpid() returs -1 EINTR instead. Note that Bart on the zsh mailing