Test exit status misinterpreted in scripts when buit without job control

2016-08-04 Thread Dan Cross
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' 
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu' 
-DCONF_VENDOR='unknown' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' 
-DSHELL -DHAVE_CONFIG_H   -I.  -I.. -I../include -I../lib   -g -O2
uname output: Linux spitfire.my.domain 3.13.0-88-generic #135-Ubuntu SMP Wed 
Jun 8 21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-unknown-linux-gnu

Bash Version: 4.3
Patch Level: 30
Release Status: release

Description:
When bash is built without job control, shell scripts that use
the 'test' builtin (e.g., via '[') in conditionals may take the
wrong branch becuase the exit status of the test is lost.

Repeat-By:
Configure without job control.  Via e.g.,
./configure --prefix=/usr --bindir=/bin --without-bash-malloc 
--disable-nls --disable-job-control
Invoke the resulting shell and run the following sequence of commands:

$ cat > foo.sh
if [ $# -lt 2 ]
then
echo "$# args is less than 2"
else
echo "$# args is not less than 2"
fi
$ chmod +x ./foo.sh
$ ./foo.sh 1 2 3 4
4 args is less than 2
$

Observe the output: '4' is not actually less than '2' yet the
script incorrectly reports it as such.

Note: we originally discovered this when porting 'bash' to a
new research operating system that does not support job control.
However, we were able to reproduce on Linux.



Re: Test exit status misinterpreted in scripts when buit without job control

2016-08-04 Thread Chet Ramey
On 8/4/16 12:05 PM, Dan Cross wrote:

> Bash Version: 4.3
> Patch Level: 30
> Release Status: release
> 
> Description:
> When bash is built without job control, shell scripts that use
> the 'test' builtin (e.g., via '[') in conditionals may take the
> wrong branch becuase the exit status of the test is lost.
> 
> Repeat-By:
> Configure without job control.  Via e.g.,
> ./configure --prefix=/usr --bindir=/bin --without-bash-malloc 
> --disable-nls --disable-job-control
> Invoke the resulting shell and run the following sequence of commands:
> 
> $ cat > foo.sh
> if [ $# -lt 2 ]
> then
> echo "$# args is less than 2"
> else
> echo "$# args is not less than 2"
> fi
> $ chmod +x ./foo.sh
> $ ./foo.sh 1 2 3 4
> 4 args is less than 2
> $
> 
> Observe the output: '4' is not actually less than '2' yet the
> script incorrectly reports it as such.

Thanks for the report.  I took a quick look at this, and it's not disabling
job control that does it: it's disabling both job control and nls.
Disabling either one while leaving the other enabled doesn't produce this
error (which only happens in the case where you run a script with the
execute bit set without a #! line after running an executable that causes
the shell to call waitpid()).  It's a strange set of circumstances.
I'll see what I can find.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: Test exit status misinterpreted in scripts when buit without job control

2016-08-04 Thread Dan Cross
On Thu, Aug 4, 2016 at 2:36 PM, Chet Ramey  wrote:

> On 8/4/16 12:05 PM, Dan Cross wrote:
> > Bash Version: 4.3
> > Patch Level: 30
> > Release Status: release
> >
> > Description:
> > When bash is built without job control, shell scripts that use
> > the 'test' builtin (e.g., via '[') in conditionals may take the
> > wrong branch becuase the exit status of the test is lost.
> >
> > Repeat-By:
> > Configure without job control.  Via e.g.,
> > ./configure --prefix=/usr --bindir=/bin --without-bash-malloc
> --disable-nls --disable-job-control
> > Invoke the resulting shell and run the following sequence of
> commands:
> >
> > $ cat > foo.sh
> > if [ $# -lt 2 ]
> > then
> > echo "$# args is less than 2"
> > else
> > echo "$# args is not less than 2"
> > fi
> > $ chmod +x ./foo.sh
> > $ ./foo.sh 1 2 3 4
> > 4 args is less than 2
> > $
> >
> > Observe the output: '4' is not actually less than '2' yet the
> > script incorrectly reports it as such.
>
> Thanks for the report.  I took a quick look at this, and it's not disabling
> job control that does it: it's disabling both job control and nls.
> Disabling either one while leaving the other enabled doesn't produce this
> error (which only happens in the case where you run a script with the
> execute bit set without a #! line after running an executable that causes
> the shell to call waitpid()).  It's a strange set of circumstances.
> I'll see what I can find.
>

Thanks, Chet. FYI, I tried building for the research kernel with NLS
enabled and am still seeing the problem. Our patch is pretty minimal
(mostly just adding the name of the OS as supported in the various
configure scripts, and we have a requirement that strings written using
'echo' get written with one system call, so I bypass stdio for that. Oh,
and we have another context string in addition to errno that we print on
errors). Also, I was able to reproduce on an unpatched bash on Linux with
NLS enabled:

% ../configure --prefix=/usr --bindir=/bin --without-bash-malloc
--disable-job-control
% grep NLS config.h
#define ENABLE_NLS 1
% make
(build output omitted for brevity)
% ./bash --noprofile --norc
$ ./foo.sh 1 2 3 4
4 args is not less than 2
$ ./foo.sh 1 2 3 4
4 args is less than 2
$ exit
%

Thanks again!

- Dan C.

(PS: If you're curious, we're porting bash to the Akaros operating system:
http://akaros.org/)