date:20150924

Re: SIGINT handling

2015-09-24 Thread Pádraig Brady

On 24/09/15 07:20, Stephane Chazelas wrote:
> 2015-09-24 07:01:23 +0100, Stephane Chazelas:
>> 2015-09-23 21:27:00 -0400, Chet Ramey:
>>> On 9/19/15 5:31 PM, Stephane Chazelas wrote:
>>>
 In case it was caused by some Debian patch, I recompiled the
 code of 4.3.42 from gnu.org and the one from the devel branch on
 the git repository (commit bash-20150911 snapshot) and still:

 $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
 ^Chi
 $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
 ^Chi
 $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
 ^C
 $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi'
 ^Chi

 Sometimes (and the frequency of occurrences is erratic,
 generally roughly 80% of "hi"s but at times, I don't see a "hi"
 in a while), the "hi" doesn't show up. Note that I press ^C well
 after sleep has started.
>>>
>>> It would be nice to see a system call trace for this so we can check
>>> what's going on with the timing.
>>
>> I don't have them logged but I did several tests in gdb
>> with "handle SIGINT nostop pass" and as I said before, 
>> Upon the test that sets child_caught_sigint, waitpid() has not
>> returned with EINTR and wait_sigint_received has been set.
>>
>> If I break on the SIGINT handler, I see the call trace at the
>> return of the "syscall".
>>
>> I can try and get you a call trace later today.
> [...]
> 
> (gdb) handle SIGINT nostop pass
> SIGINT is used by the debugger.
> Are you sure you want to change it? (y or n) y
> SignalStop  Print   Pass to program Description
> SIGINTNoYes Yes Interrupt
> (gdb) break wait_sigint_handler
> Breakpoint 1 at 0x443a70: file jobs.c, line 2241.
> (gdb) run
> Starting program: bash-4.3/bash -c ./a\;\ echo\ x
> ^C
> Program received signal SIGINT, Interrupt.
> 
> Breakpoint 1, wait_sigint_handler (sig=2) at jobs.c:2241
> 2241{
> (gdb) bt
> #0  wait_sigint_handler (sig=2) at jobs.c:2241
> #1  
> #2  0x776bc31c in __libc_waitpid (pid=pid@entry=-1, 
> stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at 
> ../sysdeps/unix/sysv/linux/waitpid.c:31
> #3  0x00445f3d in waitchld (block=block@entry=1, wpid=5337) at 
> jobs.c:3224
> #4  0x0044733b in wait_for (pid=5337) at jobs.c:2485
> #5  0x00437992 in execute_command_internal 
> (command=command@entry=0x70bb88, asynchronous=asynchronous@entry=0, 
> pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1,
> fds_to_close=fds_to_close@entry=0x70bde8) at execute_cmd.c:829
> #6  0x00437b0e in execute_command (command=0x70bb88) at 
> execute_cmd.c:390
> #7  0x00435f23 in execute_connection (fds_to_close=0x70bdc8, 
> pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bd88) at 
> execute_cmd.c:2494
> #8  execute_command_internal (command=0x70bd88, 
> asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, 
> pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bdc8)
> at execute_cmd.c:945
> #9  0x0047955b in parse_and_execute (string=, 
> from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at 
> evalstring.c:387
> #10 0x004205d7 in run_one_command (command=) at 
> shell.c:1348
> #11 0x0041f524 in main (argc=3, argv=0x7fffe258, 
> env=0x7fffe278) at shell.c:695
> (gdb) frame 2
> #2  0x776bc31c in __libc_waitpid (pid=pid@entry=-1, 
> stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at 
> ../sysdeps/unix/sysv/linux/waitpid.c:31
> 31  ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
> (gdb) disassemble
> Dump of assembler code for function __libc_waitpid:
>0x776bc300 <+0>: mov0x2f14cd(%rip),%r9d# 
> 0x779ad7d4 <__libc_multiple_threads>
>0x776bc307 <+7>: test   %r9d,%r9d
>0x776bc30a <+10>:jne0x776bc336 <__libc_waitpid+54>
>0x776bc30c <+12>:xor%r10d,%r10d
>0x776bc30f <+15>:movslq %edx,%rdx
>0x776bc312 <+18>:movslq %edi,%rdi
>0x776bc315 <+21>:mov$0x3d,%eax
>0x776bc31a <+26>:syscall
> => 0x776bc31c <+28>:cmp$0xf000,%rax
>0x776bc322 <+34>:ja 0x776bc325 <__libc_waitpid+37>
>0x776bc324 <+36>:retq
>0x776bc325 <+37>:mov0x2ebb3c(%rip),%rdx# 
> 0x779a7e68
>0x776bc32c <+44>:neg%eax
>0x776bc32e <+46>:mov%eax,%fs:(%rdx)
>0x776bc331 <+49>:or $0x,%rax
> (gdb) fin
> Run till exit from #2  0x776bc31c in __libc_waitpid 
> (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7fffdbc8, 
> options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31
> 0x00445f3d in waitchld (block=block@entry=1, wpid=5481) at jobs.c:3224
> 3224  pid = WAITPID (-1, &status, waitpid_flags);
> V

Re: SIGINT handling

2015-09-24 Thread Stephane Chazelas

2015-09-24 09:36:08 +0100, Pádraig Brady:
[...]
> > (gdb) handle SIGINT nostop pass
[...]
> 
> In case it's relevant, I'm not entirely sure of gdb's signal handling:
> https://sourceware.org/bugzilla/show_bug.cgi?id=18364

Yes, I wondered about that.

I'd expect the "handle SIGINT nostop pass", to take gdb out of
the loop, but I've not verified it and I suspect ptracing could
have side effects.

It's easy to corroborate with printfs though here which I just
did:

$ ./bash -c './a; echo x'
^Cwait_sigint_received=1 pid=-1
wait_sigint_received=1 pid=956
x
$ ./bash -c './a; echo x'
^Cwait_sigint_received=1 pid=958

$ diff -pu jobs.c\~ jobs.c
--- jobs.c~ 2015-09-20 20:03:14.692119372 +0100
+++ jobs.c  2015-09-24 11:49:03.963122465 +0100
@@ -3262,6 +3262,7 @@ itrace("waitchld: waitpid returns %d blo
 require the child to actually die due to SIGINT to act on the
 SIGINT we received; otherwise we assume the child handled it and
 let it go. */
+  fprintf(stderr, "wait_sigint_received=%d pid=%d\n", 
wait_sigint_received, pid);
   if (pid < 0 && errno == EINTR && wait_sigint_received)
child_caught_sigint = 1;

-- 
Stephane

Re: SIGINT handling

2015-09-24 Thread Stephane Chazelas

Given that the bug was introduced by Linus' patch (to fix a bug
that anyway is in all shell implementations that do WCE) and
that it's caused by a behaviour that seems to be specific to the
Linux kernel (that the kernel seems to be messing up with the
order of delivery of the SIGCHLD (or return from waitpid()) and
SIGINT), we may want to bring the issue up to him.

Here, the behaviour could be seen as a kernel bug, since the
child should clearly die *after* the SIGINT has been issued to
the parent (since the ^C should insert the SIGINT in the signal
queue of both parent and child processes at the same time) so
it's wrong for SIGINT to be handled *after* waitpid() returns.

But of course one can also argue that the order of signal
delivery is not guaranteed in general anyway.

IMO, the best approach would be to give up on WCE altogether
which is more source of frustration anyway than it has ever
helped. I live very well with a /bin/sh (dash) and interactive
shell (zsh) that don't do it.

WCE may be good in a perfect world where everything does it
(everything that calls waitpid() without using system(3)), but
if not, I hardly see the point.

What's the point of bash doing it when sh, find -exec, xargs,
watch, git (like in that emacs bug report) don't do it.

it seems to me that finding another way to address it (like
emacs approach of putting itself on its own in a new forground
job if it's not already a process group leader) for the rare
cases where it's useful (like the vi -> :! case) would be
better.

-- 
Stephane

Re: SIGINT handling

2015-09-24 Thread Chet Ramey

On 9/22/15 8:18 AM, Greg Wooledge wrote:
> On Mon, Sep 21, 2015 at 10:07:55PM +0100, Stephane Chazelas wrote:
>> Maybe the test scenario was not clear:
>>
>> bash -c 'cmd; echo hi'
>>
>> is run from an interactive shell, cmd is a long running
>> application (the problem that sparked this discussion was with
>> ping and I showed examples with an inline-script calling sleep)
> 
> Just for the record, ping is the *classic* example of an incorrectly
> written application that traps SIGINT but doesn't kill itself with
> SIGINT afterward.  (This seems to be true on multiple systems -- at
> the very least, HP-UX and Linux pings both suffer from it.)
> 
> A loop like this works as expected:
> 
> while true; do
>   sleep 1
> done
> 
> A loop like this does not:
> 
> while true; do
>   ping -c 1 some.host # or on HP-UX, ping some.host -n 1
> done

If you decide, as bash has, to allow the foreground job to determine what
to do with SIGINT, you have to cope with software like ping.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/

Re: SIGINT handling

2015-09-24 Thread Chet Ramey

On 9/24/15 9:57 AM, Stephane Chazelas wrote:

> IMO, the best approach would be to give up on WCE altogether
> which is more source of frustration anyway than it has ever
> helped. I live very well with a /bin/sh (dash) and interactive
> shell (zsh) that don't do it.

We'll agree to disagree.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/

Re: SIGINT handling

2015-09-24 Thread Stephane Chazelas

2015-09-24 14:53:16 -0400, Chet Ramey:
> On 9/24/15 9:57 AM, Stephane Chazelas wrote:
> 
> > IMO, the best approach would be to give up on WCE altogether
> > which is more source of frustration anyway than it has ever
> > helped. I live very well with a /bin/sh (dash) and interactive
> > shell (zsh) that don't do it.
> 
> We'll agree to disagree.
[...]

Now that we're settled on WCE,

would you agree that

a=$(cmd-that-catches-sigint)

should behave like

(cmd-that-catches-sigint)

(as in, not exit the shell as per WCE)?

What about $PIPESTATUS?

In:

cmd-that-catches-sigint | cmd-that-does-not
or
cmd-that-does-not | cmd-that-catches-sigint

Should we exit on SIGINT or leave that command run in
background?

Should pipefail have an influence on the behaviour? What about
lastpipe?

What about when using the wait builtin?

Why should:

cmd & wait "$!"

be treated differently from

cmd

?

Because cmd's stdin is /dev/null and so is unlikely to be an
interactive command?

So we admit WCE is a kludge 

-- 
Stephane

Re: SIGINT handling

2015-09-24 Thread Chet Ramey

On 9/21/15 5:07 PM, Stephane Chazelas wrote:

> The problem is that here the parent's SIGINT handler is run upon
> the return from waitpid(), just after. My patch doesn't rely on
> EINTR from waitpid() (which doesn't happen here, waitpid() returns
> with the pid of the child that did an exit() upon receiving
> SIGINT), just on the "status" returned by the child, so doesn't
> have the problem.

I wonder if the kernel is restarting the waitpid() even though the
signal handler was installed without SA_RESTART.

> What do you suggest we do to fix that issue?

I think your additional test for wait_sigint_received coupled with a
check that the child died for some other reason than SIGINT, in addition
to the EINTR test, is a reasonable fix.

>> This still counts as catching and handling the SIGINT, and the shell
>> should not act as if the foreground process died as a result of one.
> 
> That's the point I'm arguing on.
> 
> If the command handled SIGINT and returned with 130, I argue it
> is considering itself and telling its parent as having been
> "interrupted"

No, it's not.  If a shell exits with status 130, it's saying that the last
command it executed was killed by SIGINT, or happened to exit with status
130 for some random reason.  It's very possible for a non-interactive
shell to restore its original SIGINT handler and resend SIGINT to itself,
if it's concerned about telling its parent that it's been interrupted.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/

[PATCH] input: fix logic bug that breaks on O_TEXT systems

2015-09-24 Thread Eric Blake

The bash-20101229 snapshot introduced an attempt to incorporate
some patches that were previously used downstream by Cygwin
with regards to managing scripts read with O_TEXT mode (where
lseek() sees different offsets than the number of bytes read,
because the system is converting physical '\r\n' into logical
'\n').  But it introduced a typo, using O_TEXT where it meant
to use B_TEXT, with the result that scripts read in text mode
would seek to the wrong location.

Thanks to Jeff Downs for helping me find the typo.

Signed-off-by: Eric Blake 
---
 input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/input.c b/input.c
index 2a9ca26..308b87e 100644
--- a/input.c
+++ b/input.c
@@ -205,7 +205,7 @@ make_buffered_stream (fd, buffer, bufsize)
   if (bufsize == 1)
 bp->b_flag |= B_UNBUFF;
   if (O_TEXT && (fcntl (fd, F_GETFL) & O_TEXT) != 0)
-bp->b_flag |= O_TEXT;
+bp->b_flag |= B_TEXT;
   return (bp);
 }

-- 
2.4.3

Re: [PATCH] input: fix logic bug that breaks on O_TEXT systems

2015-09-24 Thread Chet Ramey

On 9/24/15 5:47 PM, Eric Blake wrote:
> The bash-20101229 snapshot introduced an attempt to incorporate
> some patches that were previously used downstream by Cygwin
> with regards to managing scripts read with O_TEXT mode (where
> lseek() sees different offsets than the number of bytes read,
> because the system is converting physical '\r\n' into logical
> '\n').  But it introduced a typo, using O_TEXT where it meant
> to use B_TEXT, with the result that scripts read in text mode
> would seek to the wrong location.

Good catch.  Thanks for the report and fix.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/

Re: SIGINT handling

Re: SIGINT handling

Re: SIGINT handling

Re: SIGINT handling

Re: SIGINT handling

Re: SIGINT handling

Re: SIGINT handling

[PATCH] input: fix logic bug that breaks on O_TEXT systems

Re: [PATCH] input: fix logic bug that breaks on O_TEXT systems

9 matches

Site Navigation

Mail list logo

Footer information