Bug where SIGINT trap handler isn't called
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu' -DCONF_VENDOR='unknown' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I./lib -D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4 -DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/bin' -DSTANDARD_UTILS_PATH='/usr/bin' -DSYS_BASHRC='/etc/bash.bashrc' -DSYS_BASH_LOGOUT='/etc/bash.bash_logout' uname output: Linux PatrickDesktop 3.19.2-1-ARCH #1 SMP PREEMPT Wed Mar 18 16:21:02 CET 2015 x86_64 GNU/Linux Machine Type: x86_64-unknown-linux-gnu Bash Version: 4.3 Patch Level: 39 Release Status: release Description: There's a bug that happens when waiting for a child process to complete (via wait_for in jobs.c) that isn't part of a job, like command substitution subshells. If a SIGINT is caught by the wait_sigint_handler, the handler sets the wait_sigint_received flag which is checked in set_job_status_and_cleanup. But set_job_status_and_cleanup is called from waitchld only if the pid belongs to a process that is part of a job. The result is that a SIGINT trap, if it exists, is not called if a SIGINT is received while waiting for a comsub subshell. Repeat-By: Running the following script (requires python) and giving it a SIGINT (via ^C) when it blocks doesn't produce any output from the script. Because SIGINT is trapped it should print “trapped”, however. Closing stdout causes wait_for to be called in command_substitute:subst.c. subst-sleep.sh: trap 'echo trapped >&2; exit 1' INT foo="$(python -c 'import time, os; os.close(1); time.sleep(20)')" Fix: The attached patch makes a conservative change that moves the code that calls the SIGINT trap handler from set_job_status_and_cleanup to a new function handle_sigint_caught_during_wait. This function is then additionally called from waitchld for dead children that don't belong to a job. Related to the code the fix touches: handle_sigint_caught_during_wait contains some duplicated code. Also, wait_for calls waitchld as long as the child or job is still running and waitchld and set_job_status_and_cleanup in turn call handle_sigint_caught_during_wait if the child or job is dead. This seems to ensure that the SIGINT trap handler is called at most once for a call to wait_for but isn't wait_for a better place to check wait_sigint_received after having restored the sigint handler (restore_sigint_handler()), especially since it already contains CHECK_WAIT_INTR. Not to mention that a SIGINT could be received after checking the wait_sigint_received flag and before restoring the signal handler, as far as I understand it. --- a/jobs.c +++ b/jobs.c @@ -260,6 +260,7 @@ static int print_job __P((JOB *, int, int, int)); static int process_exit_status __P((WAIT)); static int process_exit_signal __P((WAIT)); static int set_job_status_and_cleanup __P((int)); +static void handle_sigint_caught_during_wait __P((PROCESS *, int)); static WAIT job_signal_status __P((int)); static WAIT raw_job_exit_status __P((int)); @@ -3311,7 +3312,11 @@ itrace("waitchld: waitpid returns %d block = %d", pid, block); } if (job == NO_JOB) - continue; +{ + if (PEXITED (child)) +handle_sigint_caught_during_wait (child, NO_JOB); + continue; +} call_set_current += set_job_status_and_cleanup (job); @@ -3371,18 +3376,104 @@ itrace("waitchld: waitpid returns %d block = %d", pid, block); return (children_exited); } -/* Set the status of JOB and perform any necessary cleanup if the job is - marked as JDEAD. +/* Handle SIGINTs received while waiting for a foreground JOB, or CHILD if it + isn't part of a job, to finish. */ +static void handle_sigint_caught_during_wait (child, job) + PROCESS *child; + int job; +{ + SigHandler *temp_handler; + if (job != NO_JOB) +child = jobs[job]->pipe; + int tstatus, job_has_job_control, is_foreground_job; + + job_has_job_control = job == NO_JOB ? job_control != 0 : IS_JOBCONTROL (job); + is_foreground_job = job == NO_JOB ? interactive_shell == 0 : +IS_FOREGROUND (job); + + if (!(wait_sigint_received && is_foreground_job)) +return; + + /* If we're running a shell script and we get a SIGINT with a + SIGINT trap handler, but the foreground job handles it and + does not exit due to SIGINT, run the trap handler but do not + otherwise act as if we got the interrupt. */ + if (interactive_shell == 0 && + child_caught_sigint && signal_is_trapped (SIGINT)) + { +int old_frozen; +wait_sigint_received = 0; +last_command_exit_value = process_exit_status (child->status); + +old_frozen = jobs_list_frozen; +jobs_list_frozen = 1; +tstatus = maybe_call_trap_handler (SI
Re: Bug where SIGINT trap handler isn't called
On 30/06/15 02:27, Chet Ramey wrote: > On 6/29/15 3:41 PM, Patrick Plagwitz wrote: > >> Bash Version: 4.3 >> Patch Level: 39 >> Release Status: release >> >> Description: >> There's a bug that happens when waiting for a child process to complete >> (via wait_for in jobs.c) that isn't part of a job, like command >> substitution subshells. If a SIGINT is caught by the >> wait_sigint_handler, the handler sets the wait_sigint_received flag >> which is checked in set_job_status_and_cleanup. But >> set_job_status_and_cleanup is called from waitchld only if the pid >> belongs to a process that is part of a job. The result is that a SIGINT >> trap, if it exists, is not called if a SIGINT is received while waiting >> for a comsub subshell. > > This isn't exactly correct. The SIGINT trap will be called if the process > in the command substitution exits due to SIGINT. For instance, if you > change the assignment to something like: > > foo=$(sleep 20 >&- ) > or > foo=$(exec 2>&- ; sleep 20) > > both of which I think correctly reproduce the python command, the SIGINT > trap will execute. > > This is another case of the scenario most recently described in > > http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html > > In this case, python appears to catch the SIGINT (it looks like a > KeyboardInterrupt exception), print the message, and exit with status 1. > > Chet > Ok, I see. However, there appears to be some race condition when waiting for a command substitution. I have the attached combination of scripts. When run with $ bash run-until-end-of-loop.sh bash start-subst-loop.sh '2>/dev/null' , the script will try to launch and SIGINT subst-loop.sh repeatedly until the SIGINT trap is once *not* called which will happen in at most 200 repetitions on my machine. The script then ends after printing “end of loop”. The python script execute-with-sigint.py is only there to enable subst-loop.sh to receive a SIGINT at all. subst-loop.sh calls date(1) in a loop; date should have SIGINT set to SIG_DFL other than python initially. I analyzed the execution by inserting some debug output into the bash code. It seems that in the case that the SIGINT trap is not called subst-loop.sh gets the SIGINT while (or shortly before or after) calling waitpid in waitchld:jobs.c which will then return without errno == EINTR. The wait_sigint_handler will be called, though, and so wait_sigint_received will be true. If you try the script often enough, it will sometimes block. I saw that this is because wait_sigint_handler calls itself in an infinite loop. If I understand it correctly, this happens since wait_for:jobs.c calls old_sigint_handler = set_signal_handler (SIGINT, wait_sigint_handler); and if a SIGINT is caught after sigaction in set_signal_handler was called but before old_sigint_handler is set, restore_sigint_handler (); kill (getpid (), SIGINT); in wait_sigint_handler will leave wait_sigint_handler as handler. run-until-end-of-loop.sh Description: application/shellscript start-subst-loop.sh Description: application/shellscript import signal import os import sys signal.signal(signal.SIGINT, signal.SIG_DFL) sys.stderr.write(os.environ["SHELL"] + "\n") os.execv(os.environ["SHELL"], [os.environ["SHELL"]] + sys.argv[1:]) subst-loop.sh Description: application/shellscript
Re: Bug where SIGINT trap handler isn't called
On 22/07/15 04:01, Chet Ramey wrote: > On 7/16/15 12:05 AM, Patrick Plagwitz wrote: > >>> This is another case of the scenario most recently described in >>> >>> http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html >>> >>> In this case, python appears to catch the SIGINT (it looks like a >>> KeyboardInterrupt exception), print the message, and exit with status 1. >>> >>> Chet >>> >> >> Ok, I see. >> However, there appears to be some race condition when waiting for a >> command substitution. >> I have the attached combination of scripts. >> >> When run with >> $ bash run-until-end-of-loop.sh bash start-subst-loop.sh '2>/dev/null' >> , the script will try to launch and SIGINT subst-loop.sh repeatedly >> until the SIGINT trap is once *not* called which will happen in at most >> 200 repetitions on my machine. The script then ends after printing “end >> of loop”. The python script execute-with-sigint.py is only there to >> enable subst-loop.sh to receive a SIGINT at all. >> subst-loop.sh calls date(1) in a loop; date should have SIGINT set to >> SIG_DFL other than python initially. >> >> I analyzed the execution by inserting some debug output into the bash >> code. It seems that in the case that the SIGINT trap is not called >> subst-loop.sh gets the SIGINT while (or shortly before or after) calling >> waitpid in waitchld:jobs.c which will then return without errno == >> EINTR. The wait_sigint_handler will be called, though, and so >> wait_sigint_received will be true. > > This doesn't agree with what I see on RHEL6. I get waitpid() returning -1/ > EINTR, which bash interprets, using a heuristic, to mean that the child > blocked or caught SIGINT, in which case bash should not act as if it > received it. > > There is a small race condition here, which is very hard to close while > maintaining the desired behavior: bash only responds to SIGINT received > while waiting for a child if the child exits due to SIGINT. You appear > to have hit it: the timing of the ^C is such that waitpid returns > -1/EINTR, causing the shell to ignore the ^C. I suspect this is because > the child called exit(0) before the ^C arrived and the shell got the > signal, but the child exited successfully, causing the shell to assume > the child blocked or caught the SIGINT. > > That heuristic was developed as the result of an extensive discussion > between me and several Linux kernel developers back in 2011. You can > read that here: > > http://lists.gnu.org/archive/html/bug-bash/2011-02/msg00050.html > http://lists.gnu.org/archive/html/bug-bash/2011-03/msg0.html > > I will look at the signal handler race condition you identified. > > Chet > Thanks. And thanks for the information. So $ bash -c 'python -c "import time; time.sleep(10)"; echo foo' outputs foo when ^Cd because bash suppresses normal SIGINT handling if the child is assumed to have handled it itself. Meanwhile $ bash -c 'sleep 10; echo foo' doesn't output foo. But custom trap handlers are called either way in set_job_status_and_cleanup. $ bash -c 'trap "echo trap; exit" INT; python -c "import time; time.sleep(10)"; echo foo' prints trap, not foo, as does $ bash -c 'trap "echo trap; exit" INT; sleep 10; echo foo' I think command substitution has two issues. Those are because set_job_and_cleanup isn't called for the single child made for comsub. (1) The bug that was fixed in the discussion you linked is still in for comsub waits: $ bash -c 'while [ "$(exec >&-; sleep 0.001)" = "" ]; do :; done' sometimes requires two ^Cs to stop. The reason seems to be the same race condition described in http://lists.gnu.org/archive/html/bug-bash/2011-02/msg00073.html and in your and in my last mail. Checking last_command_exit_value == (128 + SIGINT) in command_substitute reflects the way the now-called child_caught_sigint was determined before the patch made during the discussion (i.e. it doesn't implement the heuristics). As a side note, http://lists.gnu.org/archive/html/bug-bash/2011-03/msg00039.html explained to me why this outcome of the race condition is at all likely. (2) Running the comsub version of one of the above scripts: $ bash -c 'trap "echo trap; exit" INT; foo="$(exec >&-; python -c "import time; time.sleep(10)")"; echo foo' yields another result (it prints foo, not trap). This is the behavior the original bug report was about and you already replied to it but is the difference between comsub children and normal ones here intended?