Bizarre interaction bug involving bash w/ lastpipe + Almquist 'wait'

2020-02-06 Thread Martijn Dekker
This is probably the strangest bug (or maybe pair of bugs) I've run into 
in nearly five years of breaking shells by developing modernish.


I've traced it to an interaction between bash >= 4.2 (i.e.: bash with 
shopt -s lastpipe) and variants of the Almquist shell, at least: dash, 
gwsh, Busybox ash, FreeBSD sh, and NetBSD 9.0rc2 sh.


Symptom: if 'return' is invoked on bash in the last element of a pipe 
executed in the main shell environment, then if you subsequently 'exec' 
an Almquist shell variant so that it has the same PID, its 'wait' 
builtin breaks.


I can consistently reproduce this on Linux, macOS, FreeBSD, NetBSD 
9.0rc2, OpenBSD, and Solaris.


To reproduce this, you need bash >= 4.2, some Almquist shell variant, 
and these two test scripts:


---begin test.bash---
fn() {
: | return
}
shopt -s lastpipe || exit
fn
exec "${1:-dash}" test.ash
---end test.bash---

---begin test.ash---
echo '*ash-begin'
: &
echo '*ash-middle'
wait "$!"
echo '*ash-end'
---end test.ash---

When executing test.bash with dash, gwsh, Busybox ash, or FreeBSD sh, 
then test.ash simply waits forever on executing 'wait "$!"'.


$ bash test.bash 
*ash-begin
*ash-middle
(nothing until ^C)

NetBSD sh behaves differently. NetBSD 8.1 sh (as installed on sdf.org 
and sdf-eu.org) seem to act completely normally, but NetBSD 9.0rc2 sh 
(on my VirtualBox test VM) segfaults. Output on NetBSD 9.0rc2:


$ bash test.bash /bin/sh
*ash-begin
*ash-middle
[1]   Segmentation fault   bash test.bash sh

I don't know if the different NetBSD sh behaviour is because the older 
NetBSD sh doesn't have the bug, or because some factor on the sdf*.org 
systems causes it to not be triggered.


To me, this smells like the use of some uninitialised value on various 
Almquist shells. Tracing that is beyond my expertise though.


Whether this also represents a bug in bash or not, I can't say. But no 
other shells trigger this that I've found, not even ksh93 and zsh which 
also execute the last element of a pipe in the main shell environment.


- Martijn

--
modernish -- harness the shell
https://github.com/modernish/modernish



Re: Bizarre interaction bug involving bash w/ lastpipe + Almquist 'wait'

2020-02-06 Thread Harald van Dijk via Bug reports for the GNU Bourne Again SHell

On 06/02/2020 16:12, Martijn Dekker wrote:
This is probably the strangest bug (or maybe pair of bugs) I've run into 
in nearly five years of breaking shells by developing modernish.


I've traced it to an interaction between bash >= 4.2 (i.e.: bash with 
shopt -s lastpipe) and variants of the Almquist shell, at least: dash, 
gwsh, Busybox ash, FreeBSD sh, and NetBSD 9.0rc2 sh.


Symptom: if 'return' is invoked on bash in the last element of a pipe 
executed in the main shell environment, then if you subsequently 'exec' 
an Almquist shell variant so that it has the same PID, its 'wait' 
builtin breaks.


I can consistently reproduce this on Linux, macOS, FreeBSD, NetBSD 
9.0rc2, OpenBSD, and Solaris.


To reproduce this, you need bash >= 4.2, some Almquist shell variant, 
and these two test scripts:


---begin test.bash---
fn() {
 : | return
}
shopt -s lastpipe || exit
fn
exec "${1:-dash}" test.ash
---end test.bash---

---begin test.ash---
echo '*ash-begin'
: &
echo '*ash-middle'
wait "$!"
echo '*ash-end'
---end test.ash---

When executing test.bash with dash, gwsh, Busybox ash, or FreeBSD sh, 
then test.ash simply waits forever on executing 'wait "$!"'.


Nice test. bash leaves the process in a state where SIGCHLD is blocked, 
and the various ash-based shells do not unblock it. Because of that, 
they do not pick up on the fact that the child process has terminated. I 
would consider this a bug both in bash and in the ash-based shells.


Cheers,
Harald van Dijk



Re: Bizarre interaction bug involving bash w/ lastpipe + Almquist 'wait'

2020-02-06 Thread Robert Elz
Date:Thu, 6 Feb 2020 19:29:41 +
From:Harald van Dijk 
Message-ID:  

  | Nice test.

Yes!

  | and the various ash-based shells do not unblock it.

We do now, the fix for that will be in 9.0 when it is released.
("now" as in as of the past half hour...)

  | Because of that, 
  | they do not pick up on the fact that the child process has terminated.

It was actually a race condition, for me it 'worked' about half the time
(seems to depend whether the wait happens in the parent before or after
the sub-process exits).

kre

ps: that core dump was an "impossible to happen" condition that this
actually made happen, that will be fixed as well, both by actually now
making it impossible like it was supposed to be (by not blocking or
ignoring SIGCHLD, ever) and by testing for it happening anyway...

The secondary fix for that one is still to be committed after I investigate
some more - I know what happened, just need to make sure what will happen
now if this situation which should never occur ever does happen again.

That the 8.1 NetBSD sh seems to work is more just an artifact of how
it runs the race I believe (or guess) - the wait & process invocation code
has changed a lot in 9 (well, 9.0RC2 for now) which seems to have made the
race a close call, instead of one sided.   But that was not an artifact of
the environment for the test, it happens for me on a real -8(ish) type
system as well.

kre




Re: Bizarre interaction bug involving bash w/ lastpipe + Almquist 'wait'

2020-02-06 Thread Robert Elz
Date:Thu, 6 Feb 2020 16:12:06 +
From:Martijn Dekker 
Message-ID:  <10e3756b-5e8f-ba00-df0d-b36c93fa2...@inlv.org>

  | NetBSD sh behaves differently. NetBSD 8.1 sh (as installed on sdf.org 
  | and sdf-eu.org) seem to act completely normally, but NetBSD 9.0rc2 sh 
  | (on my VirtualBox test VM) segfaults. Output on NetBSD 9.0rc2:

I have updated my opinion on that, I think it is "don't have the bug",
though it is possible a blocked SIGCHLD acts differently on NetBSD than
on other systems.   On NetBSD it seems to affect nothing (the shell does
not rely upon receiving SIGCHLD so not getting it is irrelevant) and
the wait code when given an arg (as your script did) would always wait
until that process exited, and return as soon as it did.

None of that is changed in -9 ... but the wait command now has -n, which
also works with a list of pids, and while waiting for any process in its
list to exit, gets told each time a process is reaped (from lower level
code) which job that process was from (new code of mine) so it can see if
the process that completed finished one of the jobs for which it is waiting.\
I wasn't expecting to see exiting children that are not the shell's children,
which is what happens here - the
: | return
creates a child (of bash) to run the ':' command, then the function
returns without waiting for that one.  You then exec the NetBSD shell,
which inherits that child (a child of the same process) but is unaware of
it.   If that one happens to exit while the ash script running on the
NetBSD sh is doing the wait command, core would dump.   (Fix for that is
now in the tree).   If the bash invoked ':' command exited some other time
and was noticed (eg: between commands) as having finished, it would simply
have been ignored.   I saw both happen.

kre