Re: SIGINT handling

2015-09-20 Thread Stephane Chazelas
2015-09-19 21:28:24 -0400, Chet Ramey:
> On 9/19/15 5:31 PM, Stephane Chazelas wrote:
> > 2015-09-19 16:42:28 -0400, Chet Ramey:
> > [...]
> >> I'm surprised you've managed to avoid the dozen or so discussions on the
> >> topic.
> >>
> >> http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html
> > [...]
> > 
> > Thanks for the links. I still think the comments on the second
> > article I sent
> > (http://thread.gmane.org/gmane.comp.shells.bash.bugs/24178/focus=24183)
> > still hold though and from a quick read I don't see those points
> > being mentioned in the past discussions (but that was a quick
> > read).
> > 
> > I notice that you mention the race conditions have been fixed,
> > but I'm still seeing some non-deterministic behaviour.
> 
> I can't reproduce this on Mac OS X and RHEL 6 and 7, the systems I have
> readily available today.
> 
> The shell notes when it sees SIGINT and whether or not waitpid returns
> -1/EINTR.  If the sleep exits due to SIGINT, even after the waitpid
> returns -1, the shell assumes it didn't catch and handle the SIGINT and
> the shell calls the trap handler.
[...]

To clarify,

In

bash -c 'sh -c "trap exit INT; sleep 99; :"; echo hi'

The command under test is "bash", not "sh". The "sh" is just
there as a cmd that does exit() upon receiving SIGINT.

It's just:

bash -c 'cmd; echo hi'

You can replace "cmd" with:

perl -e '$SIG{INT}= sub{exit}; sleep'

(or

mksh -c 'sleep 10; :'

(which does an exit(130) upon receiving SIGINT))

The problem here is that when you press CTRL-C, SIGINT is sent
to all the processes in the process group, so to "bash" and
"cmd".

Now, bash works as expected only if it handles its own SIGINT
before the child has caught its own one and exited.

When the above code exits without printing "hi", we see this
call stack for instance (breakpoint on kill() in gdb):

#0  kill () at ../sysdeps/unix/syscall-template.S:81
#1  0x0045dd8e in termsig_handler (sig=) at sig.c:588
#2  0x0045ddef in termsig_handler (sig=) at sig.c:554
#3  0x004466bb in set_job_status_and_cleanup (job=0) at jobs.c:3539
#4  waitchld (block=block@entry=1, wpid=20802) at jobs.c:3316
#5  0x0044733b in wait_for (pid=20802) at jobs.c:2485
#6  0x00437992 in execute_command_internal 
(command=command@entry=0x70aa48, asynchronous=asynchronous@entry=0, 
pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1,
fds_to_close=fds_to_close@entry=0x70bb68) at execute_cmd.c:829
#7  0x00437b0e in execute_command (command=0x70aa48) at 
execute_cmd.c:390
#8  0x00435f23 in execute_connection (fds_to_close=0x70bb48, 
pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bb08) at execute_cmd.c:2494
#9  execute_command_internal (command=0x70bb08, 
asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, 
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bb48)
at execute_cmd.c:945
#10 0x0047955b in parse_and_execute (string=, 
from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at 
evalstring.c:387
#11 0x004205d7 in run_one_command (command=) at 
shell.c:1348
#12 0x0041f524 in main (argc=3, argv=0x7fffe198, 
env=0x7fffe1b8) at shell.c:695

That is, SIGINT is being handled *after* the SIGINT handler has
been restored to its default of exiting the shell.

Now, I'm not sure how to best fix that as I suppose we don't get
any guarantee of when SIGINT will be delivered (it may be why
ksh93 ignores SIGINT altogether and relies solely on
WIFSIGNALED)

The above scenario suggests SIGCHLD is being delivered before
SIGINT which is strange. I'd expect SIGINT to be inserted by the
kernel in both cmd and bash queues upon CTRL-C, and the SIGCHLD
would necesarily come after those SIGINT. Could it be that
SIGCHLD jumps the queue?

Note that I'm not seeing that as often on every system. It seems
I can make it more likely by making the system busier.

-- 
Stephane



Re: SIGINT handling

2015-09-20 Thread Stephane Chazelas
[...]
> When the above code exits without printing "hi", we see this
> call stack for instance (breakpoint on kill() in gdb):
> 
> #0  kill () at ../sysdeps/unix/syscall-template.S:81
> #1  0x0045dd8e in termsig_handler (sig=) at sig.c:588
> #2  0x0045ddef in termsig_handler (sig=) at sig.c:554
> #3  0x004466bb in set_job_status_and_cleanup (job=0) at jobs.c:3539
> #4  waitchld (block=block@entry=1, wpid=20802) at jobs.c:3316
> #5  0x0044733b in wait_for (pid=20802) at jobs.c:2485
> #6  0x00437992 in execute_command_internal 
> (command=command@entry=0x70aa48, asynchronous=asynchronous@entry=0, 
> pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1,
> fds_to_close=fds_to_close@entry=0x70bb68) at execute_cmd.c:829
> #7  0x00437b0e in execute_command (command=0x70aa48) at 
> execute_cmd.c:390
> #8  0x00435f23 in execute_connection (fds_to_close=0x70bb48, 
> pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bb08) at 
> execute_cmd.c:2494
> #9  execute_command_internal (command=0x70bb08, 
> asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, 
> pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bb48)
> at execute_cmd.c:945
> #10 0x0047955b in parse_and_execute (string=, 
> from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at 
> evalstring.c:387
> #11 0x004205d7 in run_one_command (command=) at 
> shell.c:1348
> #12 0x0041f524 in main (argc=3, argv=0x7fffe198, 
> env=0x7fffe1b8) at shell.c:695
> 
> That is, SIGINT is being handled *after* the SIGINT handler has
> been restored to its default of exiting the shell.
[...]

Sorry, please disregard that.

I thought the termsig_handler was being invoked upon SIGINT as
the SIGINT handler, but it is being called explicitely by
set_job_status_and_cleanup so the problem is elsewhere.

child_caught_sigint is 0 while if I understand correctly it
should be 1 for a cmd that calls exit() upon SIGINT. So that's
probably probably where we should be looking.

-- 
Stephane



Re: null ptr deref and segfault in parameter_brace_transform.isra.17 () at subst.c:6827 (bash 4.4.0(1)-beta)

2015-09-20 Thread Chet Ramey
On 9/20/15 12:17 AM, Brian Carpenter wrote:
> I found another null ptr deref and segfault. This only seems to affect bash
> 4.4.0 as 4.2.37(1)-release and 4.2.37(1)-release only return a 'bad
> substitution' error message.

Thanks for the report.  This will be fixed in the next release of bash.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: SIGINT handling

2015-09-20 Thread Stephane Chazelas
2015-09-20 17:12:45 +0100, Stephane Chazelas:
[...]
> I thought the termsig_handler was being invoked upon SIGINT as
> the SIGINT handler, but it is being called explicitely by
> set_job_status_and_cleanup so the problem is elsewhere.
> 
> child_caught_sigint is 0 while if I understand correctly it
> should be 1 for a cmd that calls exit() upon SIGINT. So that's
> probably probably where we should be looking.
[...]

I had another look.

If we're to beleive gdb, child_caught_sigint is 0 because
waitpid() returns without EINTR even though wait_sigint_received
is 1.

The only reasonable explanation I can think of is that the child
handles its SIGINT first, exits which updates its state and
causes bash the parent to be scheduled, and waitpid() returns
(without EINT) and after that bash's SIGINT handler kicks in too
late.

Anyway, this patch makes the problem go away for me (and
addresses my problem #2 about exit code 130 not being treated
as an interrupted child). It might break things though if there
was a real reason for bash to check for waitpid()'s EINTR.

With that patch applied,

./bash -c 'sh -c "trap exit INT; sleep 120; :"; echo hi'
./bash -c 'mksh -c "sleep 120; :"; echo hi'

Does *not* output "hi" (as mksh or sh do a exit(130) which is
regarded as them being "interrupted by that SIGINT", or at least
reporting that the child they want to report the status of
(sleep) has been killed by a SIGINT).

And 

./bash -c 'sh -c "trap exit\ 0 INT; sleep 120; :"; echo hi'

*consistently* outputs "hi" (the zero exit status cancels the
aborting of bash).

--- jobs.c~ 2015-09-20 20:03:14.692119372 +0100
+++ jobs.c  2015-09-20 20:37:01.510892045 +0100
@@ -3257,21 +3257,15 @@ itrace("waitchld: waitpid returns %d blo
   CHECK_TERMSIG;
   CHECK_WAIT_INTR;
 
-  /* If waitpid returns -1/EINTR and the shell saw a SIGINT, then we
-assume the child has blocked or handled SIGINT.  In that case, we
-require the child to actually die due to SIGINT to act on the
-SIGINT we received; otherwise we assume the child handled it and
-let it go. */
-  if (pid < 0 && errno == EINTR && wait_sigint_received)
-   child_caught_sigint = 1;
-
   if (pid <= 0)
continue;   /* jumps right to the test */
 
-  /* If the child process did die due to SIGINT, forget our assumption
-that it caught or otherwise handled it. */
-  if (WIFSIGNALED (status) && WTERMSIG (status) == SIGINT)
-child_caught_sigint = 0;
+  /* If we received a SIGINT, but the child did not die of a SIGINT and
+ did not report a 128+SIGINT exit status, we assume the child handled
+ it and let it go. */
+  child_caught_sigint = wait_sigint_received &&
+   ! ((WIFSIGNALED (status) && WTERMSIG (status) == SIGINT) ||
+  (WIFEXITED (status) && WEXITSTATUS (status) == 128 + SIGINT));
 
   /* children_exited is used to run traps on SIGCHLD.  We don't want to
  run the trap if a process is just being continued. */

-- 
Stephane



Segfault using HISTTIMEFORMAT

2015-09-20 Thread rens
When using HISTTIMEFORMAT and a timestamp which is out of range bash seems to 
segfault. I've tested this with bash version 4.3.30(1)-release 
(x86_64-pc-linux-gnu) on Debian 8 stable.

Here is how to reproduce:
* Change (or add) a timestamp in .bash_history to:
#99

* Start a new bash shell

* Run the following commands:
export HISTTIMEFORMAT="[%F %T] "
history

When history gets executed bash segfaults.

--
Rens Hordijk


Re: Segfault using HISTTIMEFORMAT

2015-09-20 Thread Chet Ramey
On 9/20/15 4:59 PM, r...@endoria.net wrote:
> When using HISTTIMEFORMAT and a timestamp which is out of range bash seems to 
> segfault. I've tested this with bash version 4.3.30(1)-release 
> (x86_64-pc-linux-gnu) on Debian 8 stable.
> 
> Here is how to reproduce:
> * Change (or add) a timestamp in .bash_history to:
> #99
> 
> * Start a new bash shell
> 
> * Run the following commands:
> export HISTTIMEFORMAT="[%F %T] "
> history
> 
> When history gets executed bash segfaults.

It segfaults in strftime() in the C library, which must not be prepared to
handle LONG_MAX or LONG_MIN on that system.  I'll take a look and see if
we can trap that overflow, though you'd like to see strftime handle it
better.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/