[BUG] Bash not reacting to Ctrl-C

2011-02-08 Thread Oleg Nesterov
l.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP 
id p0SIb2Bh002078;
Fri, 28 Jan 2011 13:37:02 -0500
Received: from blackscsi.openrapids.net (mail.openrapids.net [64.15.138.104])
by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SIaqbY027778;
Fri, 28 Jan 2011 13:36:53 -0500
Received: from localhost (localhost [127.0.0.1])
by blackscsi.openrapids.net (Postfix) with ESMTP id AB30C140209;
Fri, 28 Jan 2011 13:36:51 -0500 (EST)
Received: from blackscsi.openrapids.net ([127.0.0.1])
by localhost (blackscsi.openrapids.net [127.0.0.1]) (amavisd-new, port 
10024)
with ESMTP id EgDquPjv+8Tc; Fri, 28 Jan 2011 13:36:50 -0500 (EST)
Received: by blackscsi.openrapids.net (Postfix, from userid 1003)
id B8815141336; Fri, 28 Jan 2011 13:36:50 -0500 (EST)
Date: Fri, 28 Jan 2011 13:36:50 -0500
From: Mathieu Desnoyers 
To: Anca Emanuel 
Cc: Thomas Gleixner , Ingo Molnar ,
Tejun Heo , rol...@redhat.com, o...@redhat.com,
jan.kratoch...@redhat.com, linux-ker...@vger.kernel.org,
torva...@linux-foundation.org, a...@linux-foundation.org,
Peter Zijlstra ,
=?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker 
Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates
Message-ID: <20110128183650.GA26633@Krystal>
References: <1296227324-25295-1-git-send-email...@kernel.org> 
<20110128165455.ga18...@elte.hu> 
 

MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: 
X-Editor: vi
X-Info: http://www.efficios.com
X-Operating-System: Linux/2.6.26-2-686 (i686)
X-Uptime: 13:29:41 up 65 days, 23:32,  1 user,  load average: 0.19, 0.09,
0.05
User-Agent: Mutt/1.5.18 (2008-05-17)
X-RedHat-Spam-Score: -0.01  (T_RP_MATCHES_RCVD)
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22
X-Scanned-By: MIMEDefang 2.67 on 10.5.110.12
Status: RO
Content-Length: 1563
Lines: 43

* Anca Emanuel (anca.eman...@gmail.com) wrote:
> On Fri, Jan 28, 2011 at 7:41 PM, Thomas Gleixner  wrote:
> > On Fri, 28 Jan 2011, Ingo Molnar wrote:
> >> See that '^C^C' line? That is where i had to do Ctrl-C twice.
> >>
> >> It only fails here about once every 10 times, so it's very rare. I have a 
> >> stock F14
> >> system running on that box, with the very latest .38 based kernel.
> >
> > Tripped over the refuse ^C thing today twice. Had to kill a kernel
> > build from another shell. It just happily displayed ^C and never
> > stopped. That happens once in a while and I have no idea either how to
> > debug that.
> 
> cc: Mathieu
> 
> Use lttng ?

Heh :) I'm sure Ingo and Thomas have their own tools for that ;) There is
one extra thing in the LTTng instrumentation that can help solve this problem:
the "input subsystem" instrumentation (enabled with ltt-armall -i). You can then
get a dump of:

- Your keystrokes (you can then grep for your ctrl-c input)
- Read/poll/select system calls (so you know when your terminal receives the
  input).
- Signals sent/delivered

Some of these are already instrumented in the mainline kernel, so you might get
away without the input subsystem instrumentation.

If I had to take a wild guess, my bet would be to take a look in the area of
signal delivery, but you never know, maybe it's a userspace bug in the X
terminal emulator code that is causing this weirdness.

Hope this helps,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

>From o...@redhat.com Fri Jan 28 18:55:32 2011
Date: Fri, 28 Jan 2011 18:55:33 +0100
From: Oleg Nesterov 
To: Ingo Molnar 
Cc: Tejun Heo , rol...@redhat.com, jan.kratoch...@redhat.com,
linux-ker...@vger.kernel.org, torva...@linux-foundation.org,
a...@linux-foundation.org, Peter Zijlstra ,
Thomas Gleixner ,
=?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker 
Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates
Message-ID: <20110128175532.ga26...@redhat.com>
References: <1296227324-25295-1-git-send-email...@kernel.org> 
<20110128165455.ga18...@elte.hu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110128165455.ga18...@elte.hu>
User-Agent: Mutt/1.5.18 (2008-05-17)
Status: RO
Content-Length: 2436
Lines: 66

On 01/28, Ingo Molnar wrote:
>
> The bug is that occasionally Ctrl-C does not get processed, and that the 
> Ctrl-C is
> 'lost'. It can be reproduced here by running ./test-signal several times, and
> Ctrl-C-ing it:
>
>  $ ./test-signal
>  ^C
>  $ ./test-signal
>  ^C^C
>  $ ./test-signal
>  ^C
>
> See that '^C^C' line? That is where i had to do Ctrl-C twice.

Reproduced.

At first glance, /bin/sh should be blamed... Hmm, probably yes,
I even reproduced this under strace, and this is what I see

 

Re: [BUG] Bash not reacting to Ctrl-C

2011-02-08 Thread Oleg Nesterov
On 02/08, Chet Ramey wrote:
>
> On 2/8/11 1:21 PM, Oleg Nesterov wrote:
> > Hello,
> >
> > We believe that the non-interactive bash doesn't handle CTRL-C
> > correctly, please look into the attached thread from lkml for
> > more details.
>
> Read http://www.cons.org/cracauer/sigint.html

oooh... it is huge! will try tomorrow.

> and see if you still
> feel the same way.

Which way? ;)

Please note that I wasn't sure when I sent this bug-report. Although
as a bash user I certainly dislike the fact you can never interrupt
the shell script reliably. Lets return to the first example,

$ sh -c 'while true; do /bin/true; done'

Do you think it is OK to miss ^C in this case?

Once again, I won't persist if you think this is fine, and I'll try
to read the docs above tomorrow. But I'll appreciate very much if
you can explain why exactly this is fine. So far I am looking at

WUE shell would not have this problem, since they discontinue
the script on their own. But as I said, they don't support
programs using SIGINT for non-exiting purposes

part of the documentation, but can't understand.

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-02-08 Thread Oleg Nesterov
On 02/08, Bob Proulx wrote:
>
> Oleg Nesterov wrote:
> > $ sh -c 'while true; do /bin/true; done'
>
> Be careful that 'sh' is actually 'bash'.  It isn't on a lot of
> machines.  To ensure that you are actually running bash you should
> call bash explicitly.  (At least we can't assume you are running bash
> otherwise.)

It is. In fact I did "./bash" while testing.

> Is the behavior you observe any different for this case?
>
>   $ bash -c 'while true; do /bin/true || exit 1; done'
>
> Or different for this case?
>
>   $ bash -e -c 'while true; do /bin/true; done'

The same.

I do not know what "-e" does (and I can't find it in man), but how
this can make a difference?

Once again. If bash gets ^C and at the same time the current foreground
child exits normally (either because this jctl signal races with exit()
or because the child hooks SIGINT and exits after that) SIGINT is lost.

set_job_status_and_cleanup() insists that WTERMSIG(child->status) should
be SIGINT, iow the child should be killed by the same signal. Otherwise
it is not going to kill itself, and the next wait_for() clears
wait_sigint_received.

This all looks intentional, but this means ^C can never work reliably.

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-02-09 Thread Oleg Nesterov
On 02/08, Chet Ramey wrote:
>
> On 2/8/11 4:17 PM, Oleg Nesterov wrote:
>
> > Once again. If bash gets ^C and at the same time the current foreground
> > child exits normally (either because this jctl signal races with exit()
> > or because the child hooks SIGINT and exits after that) SIGINT is lost.
> >
> > set_job_status_and_cleanup() insists that WTERMSIG(child->status) should
> > be SIGINT, iow the child should be killed by the same signal. Otherwise
> > it is not going to kill itself, and the next wait_for() clears
> > wait_sigint_received.
> >
> > This all looks intentional, but this means ^C can never work reliably.
>
> It depends on what you mean by `reliably'.

Sure, I understand that it is not that simple.

> Consider a script that runs
> emacs, then does other processing when emacs completes.  Emacs uses SIGINT
> internally to interrupt editing commands, but handles it and does not exit
> as a result.  Since emacs is run from a script, and job control is not
> enabled, the shell receives the SIGINT also, because it is in the
> terminal's foreground process group.  Should the shell abort the script
> when emacs exits?

In my opinion - it should. But yes, I know almost nothing about jctl
(at least the non-kernel part), and I agree this behaviour can confuse
a user too.

That is why I provided another test-case, let me repeat it:

#!./bash

perl -we '$SIG{INT} = sub {exit}; sleep'

echo "Hehe, I am going to sleep after ^C"
sleep 100

If a user presses ^C the shell can't know what he wants, kill the
script or send the signal to the current job.

However. I think the shell should react and exit. Exactly because it
runs in the same foreground process group. If the user doesn't want
this behaviour he can change the script, say,

#!./bash

trap true SIGINT
perl -we '$SIG{INT} = sub {exit}; sleep'
trap - SIGINT

echo "OK, WCE mode makes sense sometime"
sleep 100

Better yet, perhaps bash can have the new command/builtin which does
setpgid() and TIOCSPGRP before running the command.

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-02-09 Thread Oleg Nesterov
On 02/08, Chet Ramey wrote:
>
> On 2/8/11 7:11 PM, Ingo Molnar wrote:
> >
> > Oleg also found another simple testcase i think - and Thomas (Cc:-ed) 
> > reported
> > similar Ctrl-C problems with Bash as well.
>
> I tried to reproduce it and wasn't able to.  I use Mac OS X.

Strange, but I know nothing about Mac OS...

Hmm. Do you mean the "perl -e" test-case doesn't work too ? Did you
try other test-cases from http://marc.info/?l=linux-kernel&m=129623373208782
(this message was attached) ?

OK, another test-case,

#!./bash

perl -we 'kill INT, getppid'

echo "Hehe, I am going to sleep after ^C"
sleep 100

("perl -e" just sends SIGINT to the parent)

To clarify, I do not claim this particular case "proves" the shell
is buggy. Just to illustrate the problem: the shell refuses to exit
unless the child was killed by SIGINT too.

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-02-09 Thread Oleg Nesterov
On 02/09, Bob Proulx wrote:
>
> Oleg Nesterov wrote:
> > Bob Proulx wrote:
> > > Is the behavior you observe any different for this case?
> > >   $ bash -c 'while true; do /bin/true || exit 1; done'
> > > Or different for this case?
> > >   $ bash -e -c 'while true; do /bin/true; done'
> >
> > The same.
>
> I expected that to behave differently for you because I expected that
> the issue was that /bin/true was being delivered the signal but the
> exit status of /bin/true is being ignored in your test case.  In your
> test case if /bin/true caught the SIGINT then I expect the loop to
> continue.  Since you were saying that it was continuing then that is
> what I was expecting was happening.

Well, it is too late for me ;) perhaps I misunderstood your point.
But I think this doesn't matter, see below.

> > I do not know what "-e" does (and I can't find it in man), but how
> > this can make a difference?
>
> The documentation says this about -e:
>
> [... snip ...]

Aha, thanks a lot.

> Using -e would cause the shell to exit if /bin/true returned a
> non-zero exit status.  /bin/true would exit non-zero if it caught a
> SIGINT signal.

If /bin/true gets SIGINT - everything is fine. With this particular
test-case the problem is: ^C race race with true/false/whatever
doing exit(any_exit_code). In this case the shell "ignores" the
signal.

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-02-09 Thread Oleg Nesterov
On 02/09, Bob Proulx wrote:
>
> Ingo Molnar wrote:
> > Could you try the reproducer please?
> >
> > Once you run it, try to stop it via Ctrl-C, and try to do this a
> > couple of times.
>
> I was not able to reproduce your problem using your (I believe to be
> slightly incorrect) test case:
>
>   bash -c 'while true; do /bin/true; done'
>
> It was always interrupted with a single control-C on my amd64 Debian
> Squeeze machine.  I expect this means that by chance it was always
> bash running in the foreground process and /bin/true never happened to
> be there at the right time.
>
> > Do you consider it normal that it often takes 2-3 Ctrl-C attempts to
> > interrupt that script, that it is not possible to stop the script
> > reliably with a single Ctrl-C?
>
> Since the exit status of /bin/true is ignored then I think that test
> case is flawed.  I think at the least needs to check the exit status
> of the /bin/true process.
>
>   bash -c 'while true; do /bin/true || exit 1; done'

Perhaps I misread job.c (this is very posible). But afaics bash
always checks "status" after waitpid(&status), and the exit code
does not matter at all. What does matter is whether WIFSIGNALED()
and WTERMSIG() == SIGINT or not.

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-02-09 Thread Oleg Nesterov
On 02/09, Bob Proulx wrote:
>
> Oleg Nesterov wrote:
>
> > That is why I provided another test-case, let me repeat it:
>
> Sorry but I missed seeing that the first time through or I would have
> commented.
>
> > #!./bash
> > perl -we '$SIG{INT} = sub {exit}; sleep'
> > echo "Hehe, I am going to sleep after ^C"
> > sleep 100
>
> This test case is flawed in that as written perl will eat the signal
> and ignore it.  It isn't fair to explicitly ignore the signal.

Sure! But you misunderstood. This test-case does not try to prove that
bash is buggy. Quite contrary, I created it exactly because I started
to suspect that the current behaviour is probably intentional, at least
partly.

And, it illustrates how and why the test-case with /bin/true can miss
a signal. Because, from /bin/sh pov "eat the signal and exit" does not
differ from another case: ^C races with do_exit().

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-02-11 Thread Oleg Nesterov
On 02/11, Illia Bobyr wrote:
>
> Do we really need to check wait_sigint_received here?
> If the child exits because of SIGINT was indeed received all the
> processes on the same terminal will also receive it.

Only if SIGINT was sent to pgrp (like ^C sends SIGTERM to the
foreground process group).

> --- bash-4.1/jobs.c~ctrlc_exit_race   2011-02-07 13:52:48.0 +0100
> +++ bash-4.1/jobs.c   2011-02-07 13:55:30.0 +0100
> @@ -3299,7 +3299,7 @@ set_job_status_and_cleanup (job)
>signals are sent to process groups) or via kill(2) to the foreground
>process by another process (or itself).  If the shell did receive the
>SIGINT, it needs to perform normal SIGINT processing. */
> -  else if (wait_sigint_received&&  (WTERMSIG (child->status) == SIGINT)&&
> +  else if ((WTERMSIG (child->status) == SIGINT)&&

The problems is, if WTERMSIG() == SIGINT everything is fine. Quite
contrary, we need to handle the case when the last running command
was _not_ killed but exited on its own.

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-02-11 Thread Oleg Nesterov
On 02/11, Chet Ramey wrote:
>
> You do realize that this case is indistinguishable from the original
> scenario in question: the child gets the SIGINT, handles it, and exits
> successfully (or not).

I already tried to discuss this, but you didn't reply ;) See
http://www.mail-archive.com/bug-bash@gnu.org/msg08528.html

So, if I understand correctly, you mean that

#!/bin/sh

interactive_application

echo DONE

shouldn't be interrupted by SIGINT after interactive_application exits.
For example, it can be a text-editor which treats SIGINT specially.

But, in this case, shouldn't we fix the script above? In this case
the shell and the application should not run in the same tty->pgrp
group, or we can add "trap SIGINT".

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-02-11 Thread Oleg Nesterov
On 02/11, Chet Ramey wrote:
>
> In the meantime, read Martin Cracauer's description of the issue.
> http://www.cons.org/cracauer/sigint.html.

I did.

OK, OK, I didn't ;) I stopped the reading immediately after I started
to think I understand why you sent me this link.

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-02-13 Thread Oleg Nesterov
On 02/11, Linus Torvalds wrote:
>
> @@ -2424,6 +2425,18 @@ wait_for (pid)
> sigaction (SIGCHLD, &oact, (struct sigaction *)NULL);
> sigprocmask (SIG_SETMASK, &chldset, (sigset_t *)NULL);
>  #  endif
> +   /* If the waitchld returned EINTR, and the shell got a SIGINT,
> +  then the child has not died yet, and we assume that the
> +  child has blocked SIGINT. In that case, we require that the
> +  child return with WSIGTERM() == SIGINT to actually consider
> +  the ^C relevant. This is racy (the child may be in the
> +  process of exiting and bash reacted to the EINTR first),
> +  but this makes the race window much much smaller */

OK, I leave this up to you and Chet. At least the race is documented.

Another problem, child_blocked_sigint can be false positive if the
signal was sent to bash directly (not to pgrp). This means that the
next ^C won't work again.


And,

> +   if (r == -1 && errno == EINTR && wait_sigint_received)
> + {
> +   child_blocked_sigint = 1;
> + }

This can't work afaics. waitchld() can never return -1 && EINTR.
Perhaps waitchld() can set this flag, I don't know...

  3087/* If waitpid returns 0, there are running children.  If it 
returns -1,
  3088   the only other error POSIX says it can return is EINTR. */
  3089CHECK_TERMSIG;
  3090if (pid <= 0)
  3091  continue;   /* jumps right to the test */

The code looks strange btw. "jumps right to the test" is correct, but
this code does

do {
...
} while ((sigchld || block == 0) && pid > (pid_t)0);

and this "continue" in fact means "break". So, perhaps, we can do

if (pid < 0) {
if (wait_sigint_received)
child_blocked_sigint = 1;
break;
}

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-03-07 Thread Oleg Nesterov
On 03/07, Chet Ramey wrote:
>
> > So I don't think my patch is really doing what it _intends_ to do.
>
> Let's take a step back and approach this a different way.  Instead of
> trying to intuit whether or not the child did anything with the SIGINT,
> let's try to make the race condition smaller.

OK, I'll try to test this patch later to see if it make the difference...
At least the subjective difference.

But,

> The following patch is a very small change to jobs.c that makes
> wait_sigint_handler only pay attention and set wait_sigint_received when
> the shell is actually in waitpid() waiting for the child.  It uses a
> semaphore around the call to waitpid to effect that, with a little
> bookkeeping and cleanup code.  When the shell gets a SIGINT while not
> actually waiting for a child, it restores the old handler and sends
> SIGINT to itself.

Hmm. It is very possible I do not understand the patch correctly.

But doesn't this patch introduce another problem?

> *** 3090,3096 
> --- 3099,3107 
> waitpid_flags |= WNOHANG;
>   }
>
> +   waiting_for_child++;
> pid = WAITPID (-1, &status, waitpid_flags);

OK, and what if ^C comes before waiting_for_child++ ?

IIUC, in this case bash exits and leaves the current application
(say, emacs which threats SIGINT specially) alone, no?

Oleg.




Re: [BUG] Bash not reacting to Ctrl-C

2011-03-07 Thread Oleg Nesterov
On 03/07, Chet Ramey wrote:
>
> > On 03/07, Chet Ramey wrote:
> > >
> > > *** 3090,3096 
> > > --- 3099,3107 
> > > waitpid_flags |= WNOHANG;
> > >   }
> > >
> > > +   waiting_for_child++;
> > > pid = WAITPID (-1, &status, waitpid_flags);
> >
> > OK, and what if ^C comes before waiting_for_child++ ?
> >
> > IIUC, in this case bash exits and leaves the current application
> > (say, emacs which threats SIGINT specially) alone, no?
>
> Yes, it does.  However, the same problem exists now.  There is a window
> between the time bash forks, the child execs, and bash waits when a SIGINT
> can arrive and the same thing will happen.

OK... I seem to understand make_child() blocks SIGINT, but at this
point the signal handler is SIG_DFL. And then it forks and unblocks the
signal without installing the handler.

Thanks. I am just curious, is this another bug/problem or this was
intended?

Oleg.