Re: SIGINT handling

2015-09-23 Thread Chet Ramey
On 9/19/15 5:31 PM, Stephane Chazelas wrote:

> In case it was caused by some Debian patch, I recompiled the
> code of 4.3.42 from gnu.org and the one from the devel branch on
> the git repository (commit bash-20150911 snapshot) and still:
> 
> $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> ^Chi
> $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> ^Chi
> $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> ^C
> $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> ^Chi
> 
> Sometimes (and the frequency of occurrences is erratic,
> generally roughly 80% of "hi"s but at times, I don't see a "hi"
> in a while), the "hi" doesn't show up. Note that I press ^C well
> after sleep has started.

It would be nice to see a system call trace for this so we can check
what's going on with the timing.

Can you reproduce this on anything other than Debian?  I'm wondering
whether it's a Linux-4 kernel phenomenon.  Plus I don't have any
Debian machines laying around.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: SIGINT handling

2015-09-23 Thread Chet Ramey
On 9/20/15 11:52 AM, Stephane Chazelas wrote:

> When the above code exits without printing "hi", we see this
> call stack for instance (breakpoint on kill() in gdb):
> 
> #0  kill () at ../sysdeps/unix/syscall-template.S:81
> #1  0x0045dd8e in termsig_handler (sig=) at sig.c:588
> #2  0x0045ddef in termsig_handler (sig=) at sig.c:554
> #3  0x004466bb in set_job_status_and_cleanup (job=0) at jobs.c:3539
> #4  waitchld (block=block@entry=1, wpid=20802) at jobs.c:3316
> #5  0x0044733b in wait_for (pid=20802) at jobs.c:2485
> #6  0x00437992 in execute_command_internal 
> (command=command@entry=0x70aa48, asynchronous=asynchronous@entry=0, 
> pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1,
> fds_to_close=fds_to_close@entry=0x70bb68) at execute_cmd.c:829
> #7  0x00437b0e in execute_command (command=0x70aa48) at 
> execute_cmd.c:390
> #8  0x00435f23 in execute_connection (fds_to_close=0x70bb48, 
> pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bb08) at 
> execute_cmd.c:2494
> #9  execute_command_internal (command=0x70bb08, 
> asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, 
> pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bb48)
> at execute_cmd.c:945
> #10 0x0047955b in parse_and_execute (string=, 
> from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at 
> evalstring.c:387
> #11 0x004205d7 in run_one_command (command=) at 
> shell.c:1348
> #12 0x0041f524 in main (argc=3, argv=0x7fffe198, 
> env=0x7fffe1b8) at shell.c:695
> 
> That is, SIGINT is being handled *after* the SIGINT handler has
> been restored to its default of exiting the shell.

An alternate explanation is that somehow the shell is forgetting that
SIGINT is trapped.  I don't see how or why that would happen, but I
don't have enough information to determine whether that's the case.

> Now, I'm not sure how to best fix that as I suppose we don't get
> any guarantee of when SIGINT will be delivered (it may be why
> ksh93 ignores SIGINT altogether and relies solely on
> WIFSIGNALED)
> 
> The above scenario suggests SIGCHLD is being delivered before
> SIGINT which is strange. I'd expect SIGINT to be inserted by the
> kernel in both cmd and bash queues upon CTRL-C, and the SIGCHLD
> would necesarily come after those SIGINT. Could it be that
> SIGCHLD jumps the queue?

The above scenario doesn't suggest that SIGCHLD is being delivered at
all.  The shell is doing a blocking waitpid for a specific pid.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: SIGINT handling

2015-09-23 Thread Stephane Chazelas
2015-09-23 21:27:00 -0400, Chet Ramey:
> On 9/19/15 5:31 PM, Stephane Chazelas wrote:
> 
> > In case it was caused by some Debian patch, I recompiled the
> > code of 4.3.42 from gnu.org and the one from the devel branch on
> > the git repository (commit bash-20150911 snapshot) and still:
> > 
> > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> > ^Chi
> > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> > ^Chi
> > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> > ^C
> > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> > ^Chi
> > 
> > Sometimes (and the frequency of occurrences is erratic,
> > generally roughly 80% of "hi"s but at times, I don't see a "hi"
> > in a while), the "hi" doesn't show up. Note that I press ^C well
> > after sleep has started.
> 
> It would be nice to see a system call trace for this so we can check
> what's going on with the timing.

I don't have them logged but I did several tests in gdb
with "handle SIGINT nostop pass" and as I said before, 
Upon the test that sets child_caught_sigint, waitpid() has not
returned with EINTR and wait_sigint_received has been set.

If I break on the SIGINT handler, I see the call trace at the
return of the "syscall".

I can try and get you a call trace later today.

> 
> Can you reproduce this on anything other than Debian?  I'm wondering
> whether it's a Linux-4 kernel phenomenon.  Plus I don't have any
> Debian machines laying around.


It's hard to reproduce on an idle system. It's relatively easy
to reproduce on a busy one and if the "cmd" exits shortly after
receiving its SIGINT. I can reproduce on a Ubuntu 14.04 with an
older kernel (3.13). I can't reproduce on FreeBSD (in a VM
though).


cmd ==
#include 
main() {signal(2,_exit);pause();}

$ tar zcf - / >& /dev/null &
[1] 4417
$ tar zcf - / >& /dev/null &
[2] 4419
$ tar zcf - / >& /dev/null &
[3] 4421
$ bash -c './a.out; echo x'
^Cx
$ bash -c './a.out; echo x'
^C

Works on second attempt.

-- 
Stephane



Re: SIGINT handling

2015-09-23 Thread Stephane Chazelas
2015-09-24 07:01:23 +0100, Stephane Chazelas:
> 2015-09-23 21:27:00 -0400, Chet Ramey:
> > On 9/19/15 5:31 PM, Stephane Chazelas wrote:
> > 
> > > In case it was caused by some Debian patch, I recompiled the
> > > code of 4.3.42 from gnu.org and the one from the devel branch on
> > > the git repository (commit bash-20150911 snapshot) and still:
> > > 
> > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> > > ^Chi
> > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> > > ^Chi
> > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> > > ^C
> > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
> > > ^Chi
> > > 
> > > Sometimes (and the frequency of occurrences is erratic,
> > > generally roughly 80% of "hi"s but at times, I don't see a "hi"
> > > in a while), the "hi" doesn't show up. Note that I press ^C well
> > > after sleep has started.
> > 
> > It would be nice to see a system call trace for this so we can check
> > what's going on with the timing.
> 
> I don't have them logged but I did several tests in gdb
> with "handle SIGINT nostop pass" and as I said before, 
> Upon the test that sets child_caught_sigint, waitpid() has not
> returned with EINTR and wait_sigint_received has been set.
> 
> If I break on the SIGINT handler, I see the call trace at the
> return of the "syscall".
> 
> I can try and get you a call trace later today.
[...]

(gdb) handle SIGINT nostop pass
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) y
SignalStop  Print   Pass to program Description
SIGINTNoYes Yes Interrupt
(gdb) break wait_sigint_handler
Breakpoint 1 at 0x443a70: file jobs.c, line 2241.
(gdb) run
Starting program: bash-4.3/bash -c ./a\;\ echo\ x
^C
Program received signal SIGINT, Interrupt.

Breakpoint 1, wait_sigint_handler (sig=2) at jobs.c:2241
2241{
(gdb) bt
#0  wait_sigint_handler (sig=2) at jobs.c:2241
#1  
#2  0x776bc31c in __libc_waitpid (pid=pid@entry=-1, 
stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at 
../sysdeps/unix/sysv/linux/waitpid.c:31
#3  0x00445f3d in waitchld (block=block@entry=1, wpid=5337) at 
jobs.c:3224
#4  0x0044733b in wait_for (pid=5337) at jobs.c:2485
#5  0x00437992 in execute_command_internal 
(command=command@entry=0x70bb88, asynchronous=asynchronous@entry=0, 
pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1,
fds_to_close=fds_to_close@entry=0x70bde8) at execute_cmd.c:829
#6  0x00437b0e in execute_command (command=0x70bb88) at 
execute_cmd.c:390
#7  0x00435f23 in execute_connection (fds_to_close=0x70bdc8, 
pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bd88) at execute_cmd.c:2494
#8  execute_command_internal (command=0x70bd88, 
asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, 
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bdc8)
at execute_cmd.c:945
#9  0x0047955b in parse_and_execute (string=, 
from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at 
evalstring.c:387
#10 0x004205d7 in run_one_command (command=) at 
shell.c:1348
#11 0x0041f524 in main (argc=3, argv=0x7fffe258, 
env=0x7fffe278) at shell.c:695
(gdb) frame 2
#2  0x776bc31c in __libc_waitpid (pid=pid@entry=-1, 
stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at 
../sysdeps/unix/sysv/linux/waitpid.c:31
31  ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
(gdb) disassemble
Dump of assembler code for function __libc_waitpid:
   0x776bc300 <+0>: mov0x2f14cd(%rip),%r9d# 
0x779ad7d4 <__libc_multiple_threads>
   0x776bc307 <+7>: test   %r9d,%r9d
   0x776bc30a <+10>:jne0x776bc336 <__libc_waitpid+54>
   0x776bc30c <+12>:xor%r10d,%r10d
   0x776bc30f <+15>:movslq %edx,%rdx
   0x776bc312 <+18>:movslq %edi,%rdi
   0x776bc315 <+21>:mov$0x3d,%eax
   0x776bc31a <+26>:syscall
=> 0x776bc31c <+28>:cmp$0xf000,%rax
   0x776bc322 <+34>:ja 0x776bc325 <__libc_waitpid+37>
   0x776bc324 <+36>:retq
   0x776bc325 <+37>:mov0x2ebb3c(%rip),%rdx# 
0x779a7e68
   0x776bc32c <+44>:neg%eax
   0x776bc32e <+46>:mov%eax,%fs:(%rdx)
   0x776bc331 <+49>:or $0x,%rax
(gdb) fin
Run till exit from #2  0x776bc31c in __libc_waitpid (pid=pid@entry=-1, 
stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at 
../sysdeps/unix/sysv/linux/waitpid.c:31
0x00445f3d in waitchld (block=block@entry=1, wpid=5481) at jobs.c:3224
3224  pid = WAITPID (-1, &status, waitpid_flags);
Value returned is $5 = 5337
(gdb) p wait_sigint_received
$6 = 1

In the other (working) cases, the difference is that waitpid()
returs -1 EINTR instead.

Note that Bart on the zsh mailing