Bug where SIGINT trap handler isn't called

2015-06-29 Thread Patrick Plagwitz
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu'
-DCONF_VENDOR='unknown' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash'
-DSHELL -DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib
-D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pipe
-fstack-protector-strong --param=ssp-buffer-size=4
-DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/bin'
-DSTANDARD_UTILS_PATH='/usr/bin' -DSYS_BASHRC='/etc/bash.bashrc'
-DSYS_BASH_LOGOUT='/etc/bash.bash_logout'
uname output: Linux PatrickDesktop 3.19.2-1-ARCH #1 SMP PREEMPT Wed Mar
18 16:21:02 CET 2015 x86_64 GNU/Linux
Machine Type: x86_64-unknown-linux-gnu

Bash Version: 4.3
Patch Level: 39
Release Status: release

Description:
There's a bug that happens when waiting for a child process to complete
(via wait_for in jobs.c) that isn't part of a job, like command
substitution subshells. If a SIGINT is caught by the
wait_sigint_handler, the handler sets the wait_sigint_received flag
which is checked in set_job_status_and_cleanup. But
set_job_status_and_cleanup is called from waitchld only if the pid
belongs to a process that is part of a job. The result is that a SIGINT
trap, if it exists, is not called if a SIGINT is received while waiting
for a comsub subshell.

Repeat-By:
Running the following script (requires python) and giving it a SIGINT
(via ^C) when it blocks doesn't produce any output from the script.
Because SIGINT is trapped it should print “trapped”, however. Closing
stdout causes wait_for to be called in command_substitute:subst.c.

subst-sleep.sh:
  trap 'echo trapped >&2; exit 1' INT

  foo="$(python -c 'import time, os; os.close(1); time.sleep(20)')"

Fix:
The attached patch makes a conservative change that moves the code that
calls the SIGINT trap handler from set_job_status_and_cleanup to a new
function handle_sigint_caught_during_wait. This function is then
additionally called from waitchld for dead children that don't belong to
a job.

Related to the code the fix touches: handle_sigint_caught_during_wait
contains some duplicated code. Also, wait_for calls waitchld as long as
the child or job is still running and waitchld and
set_job_status_and_cleanup in turn call handle_sigint_caught_during_wait
if the child or job is dead. This seems to ensure that the SIGINT trap
handler is called at most once for a call to wait_for but isn't wait_for
a better place to check wait_sigint_received after having restored the
sigint handler (restore_sigint_handler()), especially since it already
contains CHECK_WAIT_INTR. Not to mention that a SIGINT could be received
after checking the wait_sigint_received flag and before restoring the
signal handler, as far as I understand it.
--- a/jobs.c
+++ b/jobs.c
@@ -260,6 +260,7 @@ static int print_job __P((JOB *, int, int, int));
 static int process_exit_status __P((WAIT));
 static int process_exit_signal __P((WAIT));
 static int set_job_status_and_cleanup __P((int));
+static void handle_sigint_caught_during_wait __P((PROCESS *, int));
 
 static WAIT job_signal_status __P((int));
 static WAIT raw_job_exit_status __P((int));
@@ -3311,7 +3312,11 @@ itrace("waitchld: waitpid returns %d block = %d", pid, 
block);
}
 
   if (job == NO_JOB)
-   continue;
+{
+  if (PEXITED (child))
+handle_sigint_caught_during_wait (child, NO_JOB);
+  continue;
+}
 
   call_set_current += set_job_status_and_cleanup (job);
 
@@ -3371,18 +3376,104 @@ itrace("waitchld: waitpid returns %d block = %d", pid, 
block);
   return (children_exited);
 }
 
-/* Set the status of JOB and perform any necessary cleanup if the job is
-   marked as JDEAD.
+/* Handle SIGINTs received while waiting for a foreground JOB, or CHILD if it 
+   isn't part of a job, to finish. */
+static void handle_sigint_caught_during_wait (child, job)
+ PROCESS *child;
+ int job;
+{
+  SigHandler *temp_handler;
+  if (job != NO_JOB)
+child = jobs[job]->pipe;
+  int tstatus, job_has_job_control, is_foreground_job;
+
+  job_has_job_control = job == NO_JOB ? job_control != 0 : IS_JOBCONTROL (job);
+  is_foreground_job = job == NO_JOB ? interactive_shell == 0 : 
+IS_FOREGROUND (job);
+
+  if (!(wait_sigint_received && is_foreground_job))
+return;
+
+  /* If we're running a shell script and we get a SIGINT with a
+ SIGINT trap handler, but the foreground job handles it and
+ does not exit due to SIGINT, run the trap handler but do not
+ otherwise act as if we got the interrupt. */
+  if (interactive_shell == 0 &&
+  child_caught_sigint && signal_is_trapped (SIGINT))
+  {
+int old_frozen;
+wait_sigint_received = 0;
+last_command_exit_value = process_exit_status (child->status);
+
+old_frozen = jobs_list_frozen;
+jobs_list_frozen = 1;
+tstatus = maybe_call_trap_handler (SI

Re: Bug where SIGINT trap handler isn't called

2015-07-15 Thread Patrick Plagwitz
On 30/06/15 02:27, Chet Ramey wrote:
> On 6/29/15 3:41 PM, Patrick Plagwitz wrote:
> 
>> Bash Version: 4.3
>> Patch Level: 39
>> Release Status: release
>>
>> Description:
>> There's a bug that happens when waiting for a child process to complete
>> (via wait_for in jobs.c) that isn't part of a job, like command
>> substitution subshells. If a SIGINT is caught by the
>> wait_sigint_handler, the handler sets the wait_sigint_received flag
>> which is checked in set_job_status_and_cleanup. But
>> set_job_status_and_cleanup is called from waitchld only if the pid
>> belongs to a process that is part of a job. The result is that a SIGINT
>> trap, if it exists, is not called if a SIGINT is received while waiting
>> for a comsub subshell.
> 
> This isn't exactly correct.  The SIGINT trap will be called if the process
> in the command substitution exits due to SIGINT.  For instance, if you
> change the assignment to something like:
> 
> foo=$(sleep 20 >&- )
> or
> foo=$(exec 2>&- ; sleep 20)
> 
> both of which I think correctly reproduce the python command, the SIGINT
> trap will execute.
> 
> This is another case of the scenario most recently described in
> 
> http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html
> 
> In this case, python appears to catch the SIGINT (it looks like a
> KeyboardInterrupt exception), print the message, and exit with status 1.
> 
> Chet
> 

Ok, I see.
However, there appears to be some race condition when waiting for a
command substitution.
I have the attached combination of scripts.

When run with
$ bash run-until-end-of-loop.sh bash start-subst-loop.sh '2>/dev/null'
, the script will try to launch and SIGINT subst-loop.sh repeatedly
until the SIGINT trap is once *not* called which will happen in at most
200 repetitions on my machine. The script then ends after printing “end
of loop”. The python script execute-with-sigint.py is only there to
enable subst-loop.sh to receive a SIGINT at all.
subst-loop.sh calls date(1) in a loop; date should have SIGINT set to
SIG_DFL other than python initially.

I analyzed the execution by inserting some debug output into the bash
code. It seems that in the case that the SIGINT trap is not called
subst-loop.sh gets the SIGINT while (or shortly before or after) calling
waitpid in waitchld:jobs.c which will then return without errno ==
EINTR. The wait_sigint_handler will be called, though, and so
wait_sigint_received will be true.

If you try the script often enough, it will sometimes block. I saw that
this is because wait_sigint_handler calls itself in an infinite loop.
If I understand it correctly, this happens since wait_for:jobs.c calls
  old_sigint_handler = set_signal_handler (SIGINT, wait_sigint_handler);
and if a SIGINT is caught after sigaction in set_signal_handler was
called but before old_sigint_handler is set,
  restore_sigint_handler ();
  kill (getpid (), SIGINT);
in wait_sigint_handler will leave wait_sigint_handler as handler.


run-until-end-of-loop.sh
Description: application/shellscript


start-subst-loop.sh
Description: application/shellscript
import signal
import os
import sys

signal.signal(signal.SIGINT, signal.SIG_DFL)

sys.stderr.write(os.environ["SHELL"] + "\n")
os.execv(os.environ["SHELL"], [os.environ["SHELL"]] + sys.argv[1:])


subst-loop.sh
Description: application/shellscript


Re: Bug where SIGINT trap handler isn't called

2015-07-25 Thread Patrick Plagwitz
On 22/07/15 04:01, Chet Ramey wrote:
> On 7/16/15 12:05 AM, Patrick Plagwitz wrote:
> 
>>> This is another case of the scenario most recently described in
>>>
>>> http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html
>>>
>>> In this case, python appears to catch the SIGINT (it looks like a
>>> KeyboardInterrupt exception), print the message, and exit with status 1.
>>>
>>> Chet
>>>
>>
>> Ok, I see.
>> However, there appears to be some race condition when waiting for a
>> command substitution.
>> I have the attached combination of scripts.
>>
>> When run with
>> $ bash run-until-end-of-loop.sh bash start-subst-loop.sh '2>/dev/null'
>> , the script will try to launch and SIGINT subst-loop.sh repeatedly
>> until the SIGINT trap is once *not* called which will happen in at most
>> 200 repetitions on my machine. The script then ends after printing “end
>> of loop”. The python script execute-with-sigint.py is only there to
>> enable subst-loop.sh to receive a SIGINT at all.
>> subst-loop.sh calls date(1) in a loop; date should have SIGINT set to
>> SIG_DFL other than python initially.
>>
>> I analyzed the execution by inserting some debug output into the bash
>> code. It seems that in the case that the SIGINT trap is not called
>> subst-loop.sh gets the SIGINT while (or shortly before or after) calling
>> waitpid in waitchld:jobs.c which will then return without errno ==
>> EINTR. The wait_sigint_handler will be called, though, and so
>> wait_sigint_received will be true.
> 
> This doesn't agree with what I see on RHEL6.  I get waitpid() returning -1/
> EINTR, which bash interprets, using a heuristic, to mean that the child
> blocked or caught SIGINT, in which case bash should not act as if it
> received it.
> 
> There is a small race condition here, which is very hard to close while
> maintaining the desired behavior: bash only responds to SIGINT received
> while waiting for a child if the child exits due to SIGINT.  You appear
> to have hit it: the timing of the ^C is such that waitpid returns
> -1/EINTR, causing the shell to ignore the ^C.  I suspect this is because
> the child called exit(0) before the ^C arrived and the shell got the
> signal, but the child exited successfully, causing the shell to assume
> the child blocked or caught the SIGINT.
> 
> That heuristic was developed as the result of an extensive discussion
> between me and several Linux kernel developers back in 2011.  You can
> read that here:
> 
> http://lists.gnu.org/archive/html/bug-bash/2011-02/msg00050.html
> http://lists.gnu.org/archive/html/bug-bash/2011-03/msg0.html
> 
> I will look at the signal handler race condition you identified.
> 
> Chet
> 

Thanks. And thanks for the information.

So
$ bash -c 'python -c "import time; time.sleep(10)"; echo foo'
outputs foo when ^Cd because bash suppresses normal SIGINT handling if
the child is assumed to have handled it itself. Meanwhile
$ bash -c 'sleep 10; echo foo'
doesn't output foo.

But custom trap handlers are called either way in
set_job_status_and_cleanup.
$ bash -c 'trap "echo trap; exit" INT; python -c "import time;
time.sleep(10)"; echo foo'
prints trap, not foo, as does
$ bash -c 'trap "echo trap; exit" INT; sleep 10; echo foo'

I think command substitution has two issues. Those are because
set_job_and_cleanup isn't called for the single child made for comsub.
(1) The bug that was fixed in the discussion you linked is still in for
comsub waits:
$ bash -c 'while [ "$(exec >&-; sleep 0.001)" = "" ]; do :; done'
sometimes requires two ^Cs to stop.
The reason seems to be the same race condition described in
http://lists.gnu.org/archive/html/bug-bash/2011-02/msg00073.html
and in your and in my last mail. Checking last_command_exit_value ==
(128 + SIGINT) in command_substitute reflects the way the now-called
child_caught_sigint was determined before the patch made during the
discussion (i.e. it doesn't implement the heuristics).

As a side note,
http://lists.gnu.org/archive/html/bug-bash/2011-03/msg00039.html
explained to me why this outcome of the race condition is at all likely.

(2) Running the comsub version of one of the above scripts:
$ bash -c 'trap "echo trap; exit" INT; foo="$(exec >&-; python -c
"import time; time.sleep(10)")"; echo foo'
yields another result (it prints foo, not trap). This is the behavior
the original bug report was about and you already replied to it but is
the difference between comsub children and normal ones here intended?