Hello,

We believe that the non-interactive bash doesn't handle CTRL-C
correctly, please look into the attached thread from lkml for
more details.

In short: bash incorrectly assumes that if it is interrupted
by ^C then the current foreground job should be killed by this
signal too. This doesn't work if the child exits normally, and
"WTERMSIG() == SIGINT" in set_job_status_and_cleanup() looks
wrong.

As Ingo reports, sometimes the shell script can miss ^C. Say,

        $ ./bash -c 'while true; do /bin/true; done'
        ^C^C

the 1st ^C does not get processed (needs 3-5 attempts on my machine).

Initially I thought that the code is just racy, but lets consider
another example:

        #!./bash

        perl -we '$SIG{INT} = sub {exit}; sleep'

        echo "Hehe, I am going to sleep after ^C"
        sleep 100

it doesn't react to the 1st ^C, 100% reproducible. This does not look
right to me, but otoh I can't believe this was not noticed before.
So, perhaps, there is some rationality behind this behaviour?

I do not know. The "patch" below fixes the problems but most probably
it is not correct, I don't really understand this code.

In case you can't read "perl -e" above, it is more or less equal to

        void int_handler(int sig)
        {
                exit(0);
        }

        int main(void)
        {
                signal(SIGINT, int_handler);
                pause();
        }

Thanks,

Oleg.

--- bash-4.1/jobs.c~ctrlc_exit_race     2011-02-07 13:52:48.000000000 +0100
+++ bash-4.1/jobs.c     2011-02-07 13:55:30.000000000 +0100
@@ -3299,7 +3299,7 @@ set_job_status_and_cleanup (job)
         signals are sent to process groups) or via kill(2) to the foreground
         process by another process (or itself).  If the shell did receive the
         SIGINT, it needs to perform normal SIGINT processing. */
-      else if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) &&
+      else if (wait_sigint_received /*&& (WTERMSIG (child->status) == 
SIGINT)*/ &&
              IS_FOREGROUND (job) && IS_JOBCONTROL (job) == 0)
        {
          int old_frozen;
>From mi...@elte.hu Fri Jan 28 17:55:16 2011
Return-Path: mi...@elte.hu
Received: from zmta01.collab.prod.int.phx2.redhat.com (LHLO
 zmta01.collab.prod.int.phx2.redhat.com) (10.5.5.31) by
 mail03.corp.redhat.com with LMTP; Fri, 28 Jan 2011 11:55:16 -0500 (EST)
Received: from localhost (localhost.localdomain [127.0.0.1])
        by zmta01.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 
B283192521;
        Fri, 28 Jan 2011 11:55:16 -0500 (EST)
Received: from zmta01.collab.prod.int.phx2.redhat.com ([127.0.0.1])
        by localhost (zmta01.collab.prod.int.phx2.redhat.com [127.0.0.1]) 
(amavisd-new, port 10024)
        with ESMTP id P+HIwniMugBs; Fri, 28 Jan 2011 11:55:16 -0500 (EST)
Received: from int-mx12.intmail.prod.int.phx2.redhat.com 
(int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])
        by zmta01.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 
0F9A191E51;
        Fri, 28 Jan 2011 11:55:16 -0500 (EST)
Received: from mx1.redhat.com (ext-mx05.extmail.prod.ext.phx2.redhat.com 
[10.5.110.9])
        by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP 
id p0SGtFL2010912;
        Fri, 28 Jan 2011 11:55:15 -0500
Received: from mx3.mail.elte.hu (mx3.mail.elte.hu [157.181.1.138])
        by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SGt3Rd023101;
        Fri, 28 Jan 2011 11:55:04 -0500
Received: from elvis.elte.hu ([157.181.1.14])
        by mx3.mail.elte.hu with esmtp (Exim)
        id 1Pirb8-0003br-HD
        from <mi...@elte.hu>; Fri, 28 Jan 2011 17:54:58 +0100
Received: by elvis.elte.hu (Postfix, from userid 1004)
        id 27A693E2322; Fri, 28 Jan 2011 17:54:54 +0100 (CET)
Date: Fri, 28 Jan 2011 17:54:55 +0100
From: Ingo Molnar <mi...@elte.hu>
To: Tejun Heo <t...@kernel.org>
Cc: rol...@redhat.com, o...@redhat.com, jan.kratoch...@redhat.com,
        linux-ker...@vger.kernel.org, torva...@linux-foundation.org,
        a...@linux-foundation.org, Peter Zijlstra <a.p.zijls...@chello.nl>,
        Thomas Gleixner <t...@linutronix.de>,
        =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweis...@gmail.com>
Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates
Message-ID: <20110128165455.ga18...@elte.hu>
References: <1296227324-25295-1-git-send-email...@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1296227324-25295-1-git-send-email...@kernel.org>
User-Agent: Mutt/1.5.20 (2009-08-17)
Received-SPF: neutral (mx3: 157.181.1.14 is neither permitted nor denied by 
domain of elte.hu) client-ip=157.181.1.14; envelope-from=mi...@elte.hu; 
helo=elvis.elte.hu;
X-ELTE-SpamScore: -2.0
X-ELTE-SpamLevel: 
X-ELTE-SpamCheck: no
X-ELTE-SpamVersion: ELTE 2.0 
X-ELTE-SpamCheck-Details: score=-2.0 required=5.9 tests=BAYES_00 autolearn=no 
SpamAssassin version=3.2.5
        -2.0 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
        [score: 0.0000]
X-RedHat-Spam-Score: -0.012  (SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD)
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25
X-Scanned-By: MIMEDefang 2.67 on 10.5.110.9
Status: RO
X-Status: A
Content-Length: 982
Lines: 43


Hi,

I'm hijacking this thread, to report a signal handling bug that Linux and Bash 
has, 
and which has been there at least for 10 years since i started using SMP Linux 
systems ...

It's not easy to reproduce but today i found a reproducer - maybe you guys have 
an 
idea what's going on.

There's two very simple scripts, one calls the other in an infinite loop:

 $ cat test-signal
 #!/bin/bash

 while true; do ./test-signal2; done

 $ cat test-signal2
 #!/bin/bash

 true

The bug is that occasionally Ctrl-C does not get processed, and that the Ctrl-C 
is 
'lost'. It can be reproduced here by running ./test-signal several times, and 
Ctrl-C-ing it:

 $ ./test-signal
 ^C
 $ ./test-signal
 ^C^C
 $ ./test-signal
 ^C

See that '^C^C' line? That is where i had to do Ctrl-C twice.

It only fails here about once every 10 times, so it's very rare. I have a stock 
F14 
system running on that box, with the very latest .38 based kernel.

Any ideas what's going on?

Thanks,

        Ingo

>From t...@linutronix.de Fri Jan 28 18:42:07 2011
Return-Path: t...@linutronix.de
Received: from zmta03.collab.prod.int.phx2.redhat.com (LHLO
 zmta03.collab.prod.int.phx2.redhat.com) (10.5.5.33) by
 mail03.corp.redhat.com with LMTP; Fri, 28 Jan 2011 12:42:07 -0500 (EST)
Received: from localhost (localhost.localdomain [127.0.0.1])
        by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 
0CBB14DAD2;
        Fri, 28 Jan 2011 12:42:07 -0500 (EST)
Received: from zmta03.collab.prod.int.phx2.redhat.com ([127.0.0.1])
        by localhost (zmta03.collab.prod.int.phx2.redhat.com [127.0.0.1]) 
(amavisd-new, port 10024)
        with ESMTP id ePRDBrqyKoWg; Fri, 28 Jan 2011 12:42:06 -0500 (EST)
Received: from int-mx10.intmail.prod.int.phx2.redhat.com 
(int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
        by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 
EAB694DA6E;
        Fri, 28 Jan 2011 12:42:06 -0500 (EST)
Received: from mx1.redhat.com (ext-mx07.extmail.prod.ext.phx2.redhat.com 
[10.5.110.11])
        by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP 
id p0SHg6en021159;
        Fri, 28 Jan 2011 12:42:06 -0500
Received: from www.tglx.de (www.tglx.de [62.245.132.106])
        by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SHfr14010819;
        Fri, 28 Jan 2011 12:41:54 -0500
Received: from localhost (www.tglx.de [127.0.0.1])
        by www.tglx.de (8.13.8/8.13.8/TGLX-2007100201) with ESMTP id 
p0SHfXld018645
        (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
        Fri, 28 Jan 2011 18:41:34 +0100
Date: Fri, 28 Jan 2011 18:41:33 +0100 (CET)
From: Thomas Gleixner <t...@linutronix.de>
To: Ingo Molnar <mi...@elte.hu>
cc: Tejun Heo <t...@kernel.org>, rol...@redhat.com, o...@redhat.com,
        jan.kratoch...@redhat.com, linux-ker...@vger.kernel.org,
        torva...@linux-foundation.org, a...@linux-foundation.org,
        Peter Zijlstra <a.p.zijls...@chello.nl>,
        =?ISO-8859-15?Q?Fr=E9d=E9ric_Weisbecker?= <fweis...@gmail.com>
Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates
In-Reply-To: <20110128165455.ga18...@elte.hu>
Message-ID: <alpine.LFD.2.00.1101281839390.31804@localhost6.localdomain6>
References: <1296227324-25295-1-git-send-email...@kernel.org> 
<20110128165455.ga18...@elte.hu>
User-Agent: Alpine 2.00 (LFD 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Virus-Scanned: clamav-milter 0.95.3 at www.tglx.de
X-Virus-Status: Clean
X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=failed
        version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on www.tglx.de
X-RedHat-Spam-Score: 0  ()
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23
X-Scanned-By: MIMEDefang 2.67 on 10.5.110.11
Status: RO
Content-Length: 495
Lines: 14

On Fri, 28 Jan 2011, Ingo Molnar wrote:
> See that '^C^C' line? That is where i had to do Ctrl-C twice.
> 
> It only fails here about once every 10 times, so it's very rare. I have a 
> stock F14 
> system running on that box, with the very latest .38 based kernel.

Tripped over the refuse ^C thing today twice. Had to kill a kernel
build from another shell. It just happily displayed ^C and never
stopped. That happens once in a while and I have no idea either how to
debug that.

Thanks,

        tglx

>From anca.eman...@gmail.com Fri Jan 28 19:04:25 2011
Return-Path: anca.eman...@gmail.com
Received: from zmta02.collab.prod.int.phx2.redhat.com (LHLO
 zmta02.collab.prod.int.phx2.redhat.com) (10.5.5.32) by
 mail03.corp.redhat.com with LMTP; Fri, 28 Jan 2011 13:04:25 -0500 (EST)
Received: from localhost (localhost.localdomain [127.0.0.1])
        by zmta02.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 
BF6D39E33D;
        Fri, 28 Jan 2011 13:04:25 -0500 (EST)
Authentication-Results: zmta02.collab.prod.int.phx2.redhat.com (amavisd-new);
        dkim=pass header.i=@gmail.com
Authentication-Results: zmta02.collab.prod.int.phx2.redhat.com (amavisd-new);
        domainkeys=pass header.from=anca.eman...@gmail.com
Received: from zmta02.collab.prod.int.phx2.redhat.com ([127.0.0.1])
        by localhost (zmta02.collab.prod.int.phx2.redhat.com [127.0.0.1]) 
(amavisd-new, port 10024)
        with ESMTP id 7M6e+MWnicAP; Fri, 28 Jan 2011 13:04:25 -0500 (EST)
Received: from int-mx02.intmail.prod.int.phx2.redhat.com 
(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])
        by zmta02.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 
A73D29E2E4;
        Fri, 28 Jan 2011 13:04:25 -0500 (EST)
Received: from mx1.redhat.com (ext-mx08.extmail.prod.ext.phx2.redhat.com 
[10.5.110.12])
        by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP 
id p0SI4P7l020641;
        Fri, 28 Jan 2011 13:04:25 -0500
Received: from mail-wy0-f174.google.com (mail-wy0-f174.google.com 
[74.125.82.174])
        by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SI4EGL020929;
        Fri, 28 Jan 2011 13:04:14 -0500
Received: by wyb28 with SMTP id 28so3665163wyb.33
        for <multiple recipients>; Fri, 28 Jan 2011 10:04:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:mime-version:in-reply-to:references:date
         :message-id:subject:from:to:cc:content-type;
        bh=KuEwBNPA1si9mAtCoMJXN8RPwVTkx274dWCCFtM5Wys=;
        b=R1YADBgCuQJ1s1oygHscmmEgNwTAwIPKxtan1vu5dhooIVVK46p2v/wLQ9tud8NLk5
         5QeV/h9WvggFYkG319byn4rY0eKTztAMIhU0ehAeCwkWB6cLqMCL+HcPCv6ShZO7syLx
         LdwLJhlDmPlH85e9kT2PM8AFDLa/lCdpNPFI0=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :cc:content-type;
        b=umRs0el7h6ygZAchyP+shCh7kpgeKM96SZoZP12W3Vpw1UkO4SFM9/L7C5PFD00A0q
         fJ0jk2S6p3QQedECxdDk4SonHMuEysbWEQLE6/gq7Uz0b2sDEEW+diLTgu0N7mE+EpoH
         tpFuOF87jiITCxfnxqH7tKPZbWZQ+LGrQEtpk=
MIME-Version: 1.0
Received: by 10.227.54.11 with SMTP id o11mr3094625wbg.88.1296237853514; Fri,
 28 Jan 2011 10:04:13 -0800 (PST)
Received: by 10.227.27.196 with HTTP; Fri, 28 Jan 2011 10:04:13 -0800 (PST)
In-Reply-To: <alpine.LFD.2.00.1101281839390.31804@localhost6.localdomain6>
References: <1296227324-25295-1-git-send-email...@kernel.org>
        <20110128165455.ga18...@elte.hu>
        <alpine.LFD.2.00.1101281839390.31804@localhost6.localdomain6>
Date: Fri, 28 Jan 2011 20:04:13 +0200
Message-ID: <aanlktinm53a1bzlu6jcbdgopkhw9mog2h4gd7xiry...@mail.gmail.com>
Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates
From: Anca Emanuel <anca.eman...@gmail.com>
To: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@elte.hu>, Tejun Heo <t...@kernel.org>, rol...@redhat.com,
        o...@redhat.com, jan.kratoch...@redhat.com,
        linux-ker...@vger.kernel.org, torva...@linux-foundation.org,
        a...@linux-foundation.org, Peter Zijlstra <a.p.zijls...@chello.nl>,
        =?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?= <fweis...@gmail.com>,
        Mathieu Desnoyers <mathieu.desnoy...@efficios.com>
Content-Type: text/plain; charset=ISO-8859-1
X-RedHat-Spam-Score: -0.8  
(DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS)
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12
X-Scanned-By: MIMEDefang 2.67 on 10.5.110.12
Status: RO
Content-Length: 595
Lines: 15

On Fri, Jan 28, 2011 at 7:41 PM, Thomas Gleixner <t...@linutronix.de> wrote:
> On Fri, 28 Jan 2011, Ingo Molnar wrote:
>> See that '^C^C' line? That is where i had to do Ctrl-C twice.
>>
>> It only fails here about once every 10 times, so it's very rare. I have a 
>> stock F14
>> system running on that box, with the very latest .38 based kernel.
>
> Tripped over the refuse ^C thing today twice. Had to kill a kernel
> build from another shell. It just happily displayed ^C and never
> stopped. That happens once in a while and I have no idea either how to
> debug that.

cc: Mathieu

Use lttng ?

>From comp...@mail.openrapids.net Fri Jan 28 19:37:03 2011
Return-Path: comp...@mail.openrapids.net
Received: from zmta03.collab.prod.int.phx2.redhat.com (LHLO
 zmta03.collab.prod.int.phx2.redhat.com) (10.5.5.33) by
 mail03.corp.redhat.com with LMTP; Fri, 28 Jan 2011 13:37:02 -0500 (EST)
Received: from localhost (localhost.localdomain [127.0.0.1])
        by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 
EAD534E101;
        Fri, 28 Jan 2011 13:37:02 -0500 (EST)
Received: from zmta03.collab.prod.int.phx2.redhat.com ([127.0.0.1])
        by localhost (zmta03.collab.prod.int.phx2.redhat.com [127.0.0.1]) 
(amavisd-new, port 10024)
        with ESMTP id gTXxFCnYWl+r; Fri, 28 Jan 2011 13:37:02 -0500 (EST)
Received: from int-mx09.intmail.prod.int.phx2.redhat.com 
(int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
        by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 
CAB294DAD0;
        Fri, 28 Jan 2011 13:37:02 -0500 (EST)
Received: from mx1.redhat.com (ext-mx08.extmail.prod.ext.phx2.redhat.com 
[10.5.110.12])
        by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP 
id p0SIb2Bh002078;
        Fri, 28 Jan 2011 13:37:02 -0500
Received: from blackscsi.openrapids.net (mail.openrapids.net [64.15.138.104])
        by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SIaqbY027778;
        Fri, 28 Jan 2011 13:36:53 -0500
Received: from localhost (localhost [127.0.0.1])
        by blackscsi.openrapids.net (Postfix) with ESMTP id AB30C140209;
        Fri, 28 Jan 2011 13:36:51 -0500 (EST)
Received: from blackscsi.openrapids.net ([127.0.0.1])
        by localhost (blackscsi.openrapids.net [127.0.0.1]) (amavisd-new, port 
10024)
        with ESMTP id EgDquPjv+8Tc; Fri, 28 Jan 2011 13:36:50 -0500 (EST)
Received: by blackscsi.openrapids.net (Postfix, from userid 1003)
        id B8815141336; Fri, 28 Jan 2011 13:36:50 -0500 (EST)
Date: Fri, 28 Jan 2011 13:36:50 -0500
From: Mathieu Desnoyers <mathieu.desnoy...@efficios.com>
To: Anca Emanuel <anca.eman...@gmail.com>
Cc: Thomas Gleixner <t...@linutronix.de>, Ingo Molnar <mi...@elte.hu>,
        Tejun Heo <t...@kernel.org>, rol...@redhat.com, o...@redhat.com,
        jan.kratoch...@redhat.com, linux-ker...@vger.kernel.org,
        torva...@linux-foundation.org, a...@linux-foundation.org,
        Peter Zijlstra <a.p.zijls...@chello.nl>,
        =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweis...@gmail.com>
Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates
Message-ID: <20110128183650.GA26633@Krystal>
References: <1296227324-25295-1-git-send-email...@kernel.org> 
<20110128165455.ga18...@elte.hu> 
<alpine.LFD.2.00.1101281839390.31804@localhost6.localdomain6> 
<aanlktinm53a1bzlu6jcbdgopkhw9mog2h4gd7xiry...@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <aanlktinm53a1bzlu6jcbdgopkhw9mog2h4gd7xiry...@mail.gmail.com>
X-Editor: vi
X-Info: http://www.efficios.com
X-Operating-System: Linux/2.6.26-2-686 (i686)
X-Uptime: 13:29:41 up 65 days, 23:32,  1 user,  load average: 0.19, 0.09,
        0.05
User-Agent: Mutt/1.5.18 (2008-05-17)
X-RedHat-Spam-Score: -0.01  (T_RP_MATCHES_RCVD)
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22
X-Scanned-By: MIMEDefang 2.67 on 10.5.110.12
Status: RO
Content-Length: 1563
Lines: 43

* Anca Emanuel (anca.eman...@gmail.com) wrote:
> On Fri, Jan 28, 2011 at 7:41 PM, Thomas Gleixner <t...@linutronix.de> wrote:
> > On Fri, 28 Jan 2011, Ingo Molnar wrote:
> >> See that '^C^C' line? That is where i had to do Ctrl-C twice.
> >>
> >> It only fails here about once every 10 times, so it's very rare. I have a 
> >> stock F14
> >> system running on that box, with the very latest .38 based kernel.
> >
> > Tripped over the refuse ^C thing today twice. Had to kill a kernel
> > build from another shell. It just happily displayed ^C and never
> > stopped. That happens once in a while and I have no idea either how to
> > debug that.
> 
> cc: Mathieu
> 
> Use lttng ?

Heh :) I'm sure Ingo and Thomas have their own tools for that ;) There is
one extra thing in the LTTng instrumentation that can help solve this problem:
the "input subsystem" instrumentation (enabled with ltt-armall -i). You can then
get a dump of:

- Your keystrokes (you can then grep for your ctrl-c input)
- Read/poll/select system calls (so you know when your terminal receives the
  input).
- Signals sent/delivered

Some of these are already instrumented in the mainline kernel, so you might get
away without the input subsystem instrumentation.

If I had to take a wild guess, my bet would be to take a look in the area of
signal delivery, but you never know, maybe it's a userspace bug in the X
terminal emulator code that is causing this weirdness.

Hope this helps,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

>From o...@redhat.com Fri Jan 28 18:55:32 2011
Date: Fri, 28 Jan 2011 18:55:33 +0100
From: Oleg Nesterov <o...@redhat.com>
To: Ingo Molnar <mi...@elte.hu>
Cc: Tejun Heo <t...@kernel.org>, rol...@redhat.com, jan.kratoch...@redhat.com,
        linux-ker...@vger.kernel.org, torva...@linux-foundation.org,
        a...@linux-foundation.org, Peter Zijlstra <a.p.zijls...@chello.nl>,
        Thomas Gleixner <t...@linutronix.de>,
        =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweis...@gmail.com>
Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates
Message-ID: <20110128175532.ga26...@redhat.com>
References: <1296227324-25295-1-git-send-email...@kernel.org> 
<20110128165455.ga18...@elte.hu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110128165455.ga18...@elte.hu>
User-Agent: Mutt/1.5.18 (2008-05-17)
Status: RO
Content-Length: 2436
Lines: 66

On 01/28, Ingo Molnar wrote:
>
> The bug is that occasionally Ctrl-C does not get processed, and that the 
> Ctrl-C is
> 'lost'. It can be reproduced here by running ./test-signal several times, and
> Ctrl-C-ing it:
>
>  $ ./test-signal
>  ^C
>  $ ./test-signal
>  ^C^C
>  $ ./test-signal
>  ^C
>
> See that '^C^C' line? That is where i had to do Ctrl-C twice.

Reproduced.

At first glance, /bin/sh should be blamed... Hmm, probably yes,
I even reproduced this under strace, and this is what I see

        wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted)
        --- SIGINT (Interrupt) @ 0 (0) ---
        rt_sigreturn(0)                         = -1 EINTR (Interrupted system 
call)
        wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706

So, ^C is not lost, but ./test-signal doesn't want to exit.




This is what ./test-signal does when ^C does work:

        wait4(-1, 0x7fff1c283b74, 0, NULL)      = ? ERESTARTSYS (To be 
restarted)
        --- SIGINT (Interrupt) @ 0 (0) ---
        rt_sigreturn(0)                         = -1 EINTR (Interrupted system 
call)

OK, it doesn't exit immediately, but then it kills itself:

        wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], 0, NULL) = 19585
        rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f3c3035b150}, 
{0x433d30, [], SA_RESTORER, 0x7f3c3035b150}, 8) = 0
        rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f3c3035b150}, 
{SIG_DFL, [], SA_RESTORER, 0x7f3c3035b150}, 8) = 0
        kill(19584, SIGINT)




Looking into the previous log (when it doesn't exit) again,

        wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted)
        --- SIGINT (Interrupt) @ 0 (0) ---
        rt_sigreturn(0)                         = -1 EINTR (Interrupted system 
call)
        wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706
        rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
        --- SIGCHLD (Child exited) @ 0 (0) ---
        wait4(-1, 0x7fff38842d24, WNOHANG, NULL) = -1 ECHILD (No child 
processes)
        rt_sigreturn(0x8)                       = 0
        rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f3cbdbd0150}, 
{0x433d30, [], SA_RESTORER, 0x7f3cbdbd0150}, 8) = 0
        rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
        rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
        rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
        clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f3cbe9ab780) = 9707

Perhaps the handler for SIGCHLD clears some internal i_am_going_to_exit flag,
I dunno.

Oleg.

>From mi...@elte.hu Fri Jan 28 19:30:18 2011
Return-Path: mi...@elte.hu
Received: from zmta03.collab.prod.int.phx2.redhat.com (LHLO
 zmta03.collab.prod.int.phx2.redhat.com) (10.5.5.33) by
 mail03.corp.redhat.com with LMTP; Fri, 28 Jan 2011 13:30:18 -0500 (EST)
Received: from localhost (localhost.localdomain [127.0.0.1])
        by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 
582774E0FB;
        Fri, 28 Jan 2011 13:30:18 -0500 (EST)
Received: from zmta03.collab.prod.int.phx2.redhat.com ([127.0.0.1])
        by localhost (zmta03.collab.prod.int.phx2.redhat.com [127.0.0.1]) 
(amavisd-new, port 10024)
        with ESMTP id J6ZiSqngDwV1; Fri, 28 Jan 2011 13:30:18 -0500 (EST)
Received: from int-mx10.intmail.prod.int.phx2.redhat.com 
(int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
        by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 
C80D94DAD0;
        Fri, 28 Jan 2011 13:30:17 -0500 (EST)
Received: from mx1.redhat.com (ext-mx05.extmail.prod.ext.phx2.redhat.com 
[10.5.110.9])
        by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP 
id p0SIUGmT003086;
        Fri, 28 Jan 2011 13:30:16 -0500
Received: from mx3.mail.elte.hu (mx3.mail.elte.hu [157.181.1.138])
        by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SIU4Nr010577;
        Fri, 28 Jan 2011 13:30:06 -0500
Received: from elvis.elte.hu ([157.181.1.14])
        by mx3.mail.elte.hu with esmtp (Exim)
        id 1Pit4x-0000zE-Ks
        from <mi...@elte.hu>; Fri, 28 Jan 2011 19:30:00 +0100
Received: by elvis.elte.hu (Postfix, from userid 1004)
        id 6D9093E2322; Fri, 28 Jan 2011 19:29:43 +0100 (CET)
Date: Fri, 28 Jan 2011 19:29:47 +0100
From: Ingo Molnar <mi...@elte.hu>
To: Oleg Nesterov <o...@redhat.com>
Cc: Tejun Heo <t...@kernel.org>, rol...@redhat.com, jan.kratoch...@redhat.com,
        linux-ker...@vger.kernel.org, torva...@linux-foundation.org,
        a...@linux-foundation.org, Peter Zijlstra <a.p.zijls...@chello.nl>,
        Thomas Gleixner <t...@linutronix.de>,
        =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweis...@gmail.com>
Subject: Re: Bash not reacting to Ctrl-C
Message-ID: <20110128182947.gb20...@elte.hu>
References: <1296227324-25295-1-git-send-email...@kernel.org>
 <20110128165455.ga18...@elte.hu>
 <20110128175532.ga26...@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110128175532.ga26...@redhat.com>
User-Agent: Mutt/1.5.20 (2009-08-17)
Received-SPF: neutral (mx3: 157.181.1.14 is neither permitted nor denied by 
domain of elte.hu) client-ip=157.181.1.14; envelope-from=mi...@elte.hu; 
helo=elvis.elte.hu;
X-ELTE-SpamScore: -2.0
X-ELTE-SpamLevel: 
X-ELTE-SpamCheck: no
X-ELTE-SpamVersion: ELTE 2.0 
X-ELTE-SpamCheck-Details: score=-2.0 required=5.9 tests=BAYES_00 autolearn=no 
SpamAssassin version=3.2.5
        -2.0 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
        [score: 0.0000]
X-RedHat-Spam-Score: -0.012  (SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD)
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23
X-Scanned-By: MIMEDefang 2.67 on 10.5.110.9
Status: RO
X-Status: A
Content-Length: 1263
Lines: 40


* Oleg Nesterov <o...@redhat.com> wrote:

> On 01/28, Ingo Molnar wrote:
> >
> > The bug is that occasionally Ctrl-C does not get processed, and that the 
> > Ctrl-C is
> > 'lost'. It can be reproduced here by running ./test-signal several times, 
> > and
> > Ctrl-C-ing it:
> >
> >  $ ./test-signal
> >  ^C
> >  $ ./test-signal
> >  ^C^C
> >  $ ./test-signal
> >  ^C
> >
> > See that '^C^C' line? That is where i had to do Ctrl-C twice.
> 
> Reproduced.
> 
> At first glance, /bin/sh should be blamed... Hmm, probably yes,
> I even reproduced this under strace, and this is what I see
> 
>       wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted)
>       --- SIGINT (Interrupt) @ 0 (0) ---
>       rt_sigreturn(0)                         = -1 EINTR (Interrupted system 
> call)
>       wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706
> 
> So, ^C is not lost, but ./test-signal doesn't want to exit.

Might be some Bash assumption or race that works under other OSs but somehow 
Linux 
does differently. IIRC Bash is being developed on MacOS-X.

But it's happening all the time (with yum for example - but also with makejobs, 
as 
Thomas has reported it) - this is simply the first time i managed to reproduce 
it 
with something really simple.

Thanks,

        Ingo

>From o...@redhat.com Sat Feb  5 21:34:22 2011
Date: Sat, 5 Feb 2011 21:34:22 +0100
From: Oleg Nesterov <o...@redhat.com>
To: Ingo Molnar <mi...@elte.hu>
Cc: Tejun Heo <t...@kernel.org>, rol...@redhat.com, jan.kratoch...@redhat.com,
        linux-ker...@vger.kernel.org, torva...@linux-foundation.org,
        a...@linux-foundation.org, Peter Zijlstra <a.p.zijls...@chello.nl>,
        Thomas Gleixner <t...@linutronix.de>,
        =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweis...@gmail.com>
Subject: Re: Bash not reacting to Ctrl-C
Message-ID: <20110205203422.ga12...@redhat.com>
References: <1296227324-25295-1-git-send-email...@kernel.org> 
<20110128165455.ga18...@elte.hu> <20110128175532.ga26...@redhat.com> 
<20110128182947.gb20...@elte.hu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110128182947.gb20...@elte.hu>
User-Agent: Mutt/1.5.18 (2008-05-17)
Status: RO
X-Status: A
Content-Length: 1970
Lines: 55

On 01/28, Ingo Molnar wrote:
>
> * Oleg Nesterov <o...@redhat.com> wrote:
>
> > On 01/28, Ingo Molnar wrote:
> > >
> > > The bug is that occasionally Ctrl-C does not get processed, and that the 
> > > Ctrl-C is
> > > 'lost'. It can be reproduced here by running ./test-signal several times, 
> > > and
> > > Ctrl-C-ing it:
> > >
> > >  $ ./test-signal
> > >  ^C
> > >  $ ./test-signal
> > >  ^C^C
> > >  $ ./test-signal
> > >  ^C
> > >
> > > See that '^C^C' line? That is where i had to do Ctrl-C twice.
> >
> > Reproduced.
> >
> > At first glance, /bin/sh should be blamed... Hmm, probably yes,
> > I even reproduced this under strace, and this is what I see
> >
> >     wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted)
> >     --- SIGINT (Interrupt) @ 0 (0) ---
> >     rt_sigreturn(0)                         = -1 EINTR (Interrupted system 
> > call)
> >     wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706
> >
> > So, ^C is not lost, but ./test-signal doesn't want to exit.
>
> Might be some Bash assumption or race that works under other OSs but somehow 
> Linux
> does differently. IIRC Bash is being developed on MacOS-X.
>
> But it's happening all the time (with yum for example - but also with 
> makejobs, as
> Thomas has reported it) - this is simply the first time i managed to 
> reproduce it
> with something really simple.

OK, I seem to understand what happens. Of course I am not sure, I never
looked into these sources before...

Suppose that jctl ^C races with the normal child exit. In this case
waitchld() sets child->status = status (zero in this case) and calls
set_job_status_and_cleanup().

set_job_status_and_cleanup() notice wait_sigint_received and send
SIGINT to itself (termsig_handler (SIGINT)), but somehow it assumes
that the last foreground job should be terminated by SIGINT too:

         else if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) 
&&

Then the next wait_for() clears wait_sigint_received and bash
looses ^C

Oleg.

>From o...@redhat.com Mon Feb  7 14:08:41 2011
Date: Mon, 7 Feb 2011 14:08:41 +0100
From: Oleg Nesterov <o...@redhat.com>
To: Ingo Molnar <mi...@elte.hu>
Cc: Tejun Heo <t...@kernel.org>, rol...@redhat.com, jan.kratoch...@redhat.com,
        linux-ker...@vger.kernel.org, torva...@linux-foundation.org,
        a...@linux-foundation.org, Peter Zijlstra <a.p.zijls...@chello.nl>,
        Thomas Gleixner <t...@linutronix.de>,
        =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweis...@gmail.com>
Subject: Re: Bash not reacting to Ctrl-C
Message-ID: <20110207130841.ga16...@redhat.com>
References: <1296227324-25295-1-git-send-email...@kernel.org> 
<20110128165455.ga18...@elte.hu> <20110128175532.ga26...@redhat.com> 
<20110128182947.gb20...@elte.hu> <20110205203422.ga12...@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110205203422.ga12...@redhat.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Status: RO
Content-Length: 3145
Lines: 84

On 02/05, Oleg Nesterov wrote:
>
> On 01/28, Ingo Molnar wrote:
> >
> > * Oleg Nesterov <o...@redhat.com> wrote:
> >
> > > On 01/28, Ingo Molnar wrote:
> > > >
> > > > The bug is that occasionally Ctrl-C does not get processed, and that 
> > > > the Ctrl-C is
> > > > 'lost'. It can be reproduced here by running ./test-signal several 
> > > > times, and
> > > > Ctrl-C-ing it:
> > > >
> > > >  $ ./test-signal
> > > >  ^C
> > > >  $ ./test-signal
> > > >  ^C^C
> > > >  $ ./test-signal
> > > >  ^C
> > > >
> > > > See that '^C^C' line? That is where i had to do Ctrl-C twice.
> > >
> > > Reproduced.
> > >
> > > At first glance, /bin/sh should be blamed... Hmm, probably yes,
> > > I even reproduced this under strace, and this is what I see
> > >
> > >   wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted)
> > >   --- SIGINT (Interrupt) @ 0 (0) ---
> > >   rt_sigreturn(0)                         = -1 EINTR (Interrupted system 
> > > call)
> > >   wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706
> > >
> > > So, ^C is not lost, but ./test-signal doesn't want to exit.
> >
> > Might be some Bash assumption or race that works under other OSs but 
> > somehow Linux
> > does differently. IIRC Bash is being developed on MacOS-X.
> >
> > But it's happening all the time (with yum for example - but also with 
> > makejobs, as
> > Thomas has reported it) - this is simply the first time i managed to 
> > reproduce it
> > with something really simple.
>
> OK, I seem to understand what happens. Of course I am not sure, I never
> looked into these sources before...
>
> Suppose that jctl ^C races with the normal child exit. In this case
> waitchld() sets child->status = status (zero in this case) and calls
> set_job_status_and_cleanup().
>
> set_job_status_and_cleanup() notice wait_sigint_received and send
> SIGINT to itself (termsig_handler (SIGINT)), but somehow it assumes
> that the last foreground job should be terminated by SIGINT too:
>
>        else if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) 
> &&
>
> Then the next wait_for() clears wait_sigint_received and bash
> looses ^C

IOW.

Now that it is clear what happens, the test-case becomes even more
trivial:

        bash-4.1$ ./bash -c 'while true; do /bin/true; done'
        ^C^C

needs 4-5 attempts on my machine.

The patch below fixes the problem, but most probably it is not
correct. Although I don't understand the point of "status == SIGINT"
check, we already checked this job is dead. But I won't pretend I
really understand this code.

Oleg.

--- bash-4.1/jobs.c~ctrlc_exit_race     2011-02-07 13:52:48.000000000 +0100
+++ bash-4.1/jobs.c     2011-02-07 13:55:30.000000000 +0100
@@ -3299,7 +3299,7 @@ set_job_status_and_cleanup (job)
         signals are sent to process groups) or via kill(2) to the foreground
         process by another process (or itself).  If the shell did receive the
         SIGINT, it needs to perform normal SIGINT processing. */
-      else if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) &&
+      else if (wait_sigint_received /*&& (WTERMSIG (child->status) == 
SIGINT)*/ &&
              IS_FOREGROUND (job) && IS_JOBCONTROL (job) == 0)
        {
          int old_frozen;

Reply via email to