AIX and Interix also do early PID recycling.

2012-07-24 Thread michael . haubenwallner
Configuration Information [Automatically generated, do not change]:
Machine: powerpc
OS: aix5.3.0.0
Compiler: powerpc-ibm-aix5.3.0.0-gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='powerpc' 
-DCONF_OSTYPE='aix5.3.0.0' -DCONF_MACHTYPE='powerpc-ibm-aix5.3.0.0' 
-DCONF_VENDOR='ibm' -DLOCALEDIR='/tools/haubi/gentoo/sauxz3/usr/share/locale' 
-DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H   -I. -I./include -I. -I./include 
-I./lib  
-DDEFAULT_PATH_VALUE='/tools/haubi/gentoo/sauxz3/usr/sbin:/tools/haubi/gentoo/sauxz3/usr/bin:/tools/haubi/gentoo/sauxz3/sbin:/tools/haubi/gentoo/sauxz3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
 
-DSTANDARD_UTILS_PATH='/tools/haubi/gentoo/sauxz3/bin:/tools/haubi/gentoo/sauxz3/usr/bin:/tools/haubi/gentoo/sauxz3/sbin:/tools/haubi/gentoo/sauxz3/usr/sbin:/bin:/usr/bin:/sbin:/usr/sbin'
 -DSYS_BASHRC='/tools/haubi/gentoo/sauxz3/etc/bash/bashrc' 
-DSYS_BASH_LOGOUT='/tools/haubi/gentoo/sauxz3/etc/bash/bash_logout' 
-DNON_INTERACTIVE_LOGIN_SHELLS -DSSH_SOURCE_BASHRC 
-I/tools/haubi/gentoo/sauxz3/usr/include -g -O2
uname output: AIX sauxz3 3 5 0A03D600
Machine Type: powerpc-ibm-aix5.3.0.0

Bash Version: 4.2
Patch Level: 36
Release Status: release

Description:
On AIX (5.3, 6.1, 7.1), as well as on Interix (any version) I do 
encounter
some race condition in a code similar to:
if grep "unwanted" /some/nonexistent/filename
then
  echo "bad"
  exit 1
fi
echo "good"
Sometimes it does "bad" while it should do "good" always.

As this is part of some large build process, and occurs every once in a
while, I've been unable to create a small testcase.

However, I've found /very/ similar problem description with Cygwin,
even if I don't grok why defining RECYCLES_PIDS shouldn't be enough 
here:
http://www.cygwin.com/ml/cygwin/2004-09/msg00882.html

Also, with an older version of libtool, I've had this problem too:
http://lists.gnu.org/archive/html/bug-bash/2008-07/msg00117.html

While this doesn't happen with more recent libtool any more,
I've found an identical problem description with Cygwin again,
even if that test script doesn't expose the problem to me:
http://sources.redhat.com/ml/cygwin/2002-08/msg00449.html

However, here's a rough script to show when PIDs get recycled.
---
  #! /usr/bin/env bash

  count=0
  min=
  max=
  while true
  do
  /bin/true &
  last=$!
  [[ ${min:=${last}} -gt ${last} ]] && min=${last}
  [[ ${max:=${last}} -lt ${last} ]] && max=${last}
  [[ ${#last} > 4 ]] && used=used_${last::((${#last}-4))} || 
used=used_0

  if [[ ${!used} == *" ${last} "* ]]; then
  break
  fi
  (( count+=1 ))
  eval "${used}+=' ${last} '"
  if [[ "${count}" == *000 ]]; then
  echo ${count}
  fi
  done

  echo "reused pid ${last} (min ${min}, max ${max}) after ${count} 
trials."
---

On AIX, this script shows something like:
  "reused pid 121692 (min 88110, max 121854) after 254 trials."

On Interix, this is something like:
  "reused pid 1805 (min 135, max 2107) after 121 trials."

Running this script multiple times in parallel reduces number of trials,
especially on Interix.

Linux, HP-UX and Solaris need something near 32k trials, depending
on how the kernel is configured (Linux: kernel.pid_max).

Repeat-By:
Unable to repeat in a small testcase, but adding some debug fprintf's
to bash's job.c and execute_cmd.c near fork() and waitpid() allowed
me to identify pid-recycling as the root problem.

Fix:
Define RECYCLES_PIDS for AIX and Interix too (like for Cygwin and 
LynxOS).



Re: AIX and Interix also do early PID recycling.

2012-07-24 Thread Michael Haubenwallner

On 07/24/2012 05:49 PM, Greg Wooledge wrote:
> On Tue, Jul 24, 2012 at 05:03:36PM +0200, michael.haubenwall...@salomon.at 
> wrote:
>> Description:
>>  On AIX (5.3, 6.1, 7.1), as well as on Interix (any version) I do 
>> encounter
>>  some race condition in a code similar to:
>>  if grep "unwanted" /some/nonexistent/filename
>>  then
>>echo "bad"
>>exit 1
>>  fi
>>  echo "good"

There is nothing multiprocessing nor asynchronous in this script snippet, there
isn't even a pipe or subshell somewhere. Copy&pasted the code actually is:

# Verify that the libtool files don't contain bogus $D entries.
local abort=no gentoo_bug=no
for a in "${ED}"usr/lib*/*.la ; do
s=${a##*/}
if grep -qs "${D}" "${a}" ; then
vecho -ne '\a\n'
eqawarn "QA Notice: ${s} appears to contain PORTAGE_TMPDIR 
paths"
abort="yes"
fi
done
[[ ${abort} == "yes" ]] && die "soiled libtool library files found"

When it erroneously fails, the message is "QA Notice: *.la appears to contain 
...",
however there is no filename '*.la'.

Agreeed, a bug here is that nullglob should be set to not run grep at all when
there is no *.la file, but this just would hide the bash bug...

>>  Sometimes it does "bad" while it should do "good" always.
> 
> If that happens, then I don't see how it is related to recyling PIDs.
> In fact, if grep is failing to produce the correct result, it it a
> problem with your OS's implementation of grep, and not with bash.

Adding some debug-printfs to bash itself around fork, execve, waitpid shows:

Bash usually does fork()+execve("grep"), as well as waitpid(-1, ...).

Whenever waitpid() returns this "grep" one's PID, the reported exitstatus
always is 2, even when bash goes "bad"...

Adding more debug-printfs to bash's wait_for() and execute_command_internal() 
shows:

Usually, execute_command_internal() does wait_for(this one grep's PID) before
executing anything else, correctly evaluating the returnvalue to be "not true",
skipping the "bad" part.

But when there was some previous but unrelated command, where fork() returned 
the
same PID than for this "grep", execute_command_internal() does /not/ wait_for() 
at all,
because last_made_pid is equal to last_pid, and the path to "bad" is gone 
instead,
as another exitstatus is evaluated instead of this grep's one.

However, in a subsequent wait_for(another child), waitpid() does report 
exitstatus 2
for this grep's PID, but bash has gone "bad" already and ignores that 
exitstatus.

>>[[ ${#last} > 4 ]] && used=used_${last::((${#last}-4))} || 
>> used=used_0
> 
> That [[ ${#last} > 4 ]] check is incorrect.  You're doing a string
> comparison there; [[ 10 > 4 ]] is false.  Either use ((...)) or use -gt.

Indeed! (but irrelevant - is just a performance optimisation)

> In any case, if your script breaks because PIDs are recycled sooner than
> you expect, then it is a bug in your script, and not in bash.

It's not me nor my script to expect anything about PIDs at all here.

> (What
> would you expect bash to do about it in the first place?)  It may also
> interest you to know that there are some operating systems that use
> random PID allocation, instead of sequential (OpenBSD for example).

PID randomisation isn't a problem at all, as long as a previously used PID
is not reused too early.

> http://mywiki.wooledge.org/ProcessManagement has some tips on how to
> deal with multiple processes.

Interesting page, but there's nothing that applies here.

Thank you anyway!
/haubi/



Re: AIX and Interix also do early PID recycling.

2012-07-25 Thread Michael Haubenwallner

On 07/25/2012 03:05 AM, Chet Ramey wrote:
> Bash assumes that there's a PID space at least as
> large as CHILD_MAX, and that the kernel will use all of it before reusing
> any PID in the space.  Posix says that shells must remember up to CHILD_MAX
> statuses of terminated asynchronous children (the description of `wait'),
> so implicitly the kernel is not allowed to reuse process IDs until it has
> exhausted CHILD_MAX PIDs.

What about grand-childs?
They do count for the kernel, but not for the toplevel shell...

> The description of fork() doesn't mention this,
> however.  The Posix fork() requirement that the PID returned can't
> correspond to an existing process or process group is not sufficient to
> satisfy the requirement on `wait'.

OTOH, AFAICT, as long as a PID isn't waitpid()ed for, it isn't reused by fork().
However, I'm unable to find that in the POSIX spec.

> Bash holds on to the status of all terminated processes, not just
> background ones, and only checks for the presence of a newly-forked PID
> in that list if the list size exceeds CHILD_MAX.  One of the results of
> defining RECYCLES_PIDS is that the check is performed on every created
> process.

What if the shell does not do waitpid(-1), but waitpid(known-child-PID).
That would mean to waitpid(synchronous-child-PID) immediately, and
waitpid(asynchronous-child-PID) upon some "wait $!" shell command, rendering
to waitpid(-1) when there's no PID passed to "wait".

> I'd be interested in knowing the value of CHILD_MAX (or even `ulimit -c')
> on the system where you're seeing this problem.

The AIX 6.1 I've debugged on has:
  #define CHILD_MAX 128
  #define _POSIX_CHILD_MAX 25
  sysconf(_SC_CHILD_MAX) = 1024

  $ ulimit -H -c -u
  core file size  (blocks, -c) unlimited
  max user processes  (-u) unlimited

  $ ulimit -S -c -u
  core file size  (blocks, -c) 1048575
  max user processes  (-u) unlimited

The Interix 6.1 we do have similar-looking stability problems has:
  CHILD_MAX not defined
  #define _POSIX_CHILD_MAX 6
  sysconf(_SC_CHILD_MAX) = 512

  $ ulimit -H -c -u
  core file size (blocks, -c) unlimited
  max user processes (-u) 512

  $ ulimit -S -c -u
  core file size (blocks, -c) unlimited
  max user processes (-u) 512

> The case where last_made_pid is equal to last_pid is a problem only when
> the PID space is extremely small -- on the order of, say, 4 -- as long as
> the kernel behaves as described above.

I'm going to run this build job with 'truss -t kfork' again, to eventually find
some too small count of different PIDs before PID-recycling by the kernel...

Anyway - defining RECYCLES_PIDS for that AIX 6.1 has reduced the error rate for
this one build job from ~37 to 0 when run 50 times.

/haubi/



Re: AIX and Interix also do early PID recycling.

2012-07-25 Thread Michael Haubenwallner

On 07/25/2012 09:59 AM, Michael Haubenwallner wrote:
> On 07/25/2012 03:05 AM, Chet Ramey wrote:
>> Bash holds on to the status of all terminated processes, not just
>> background ones, and only checks for the presence of a newly-forked PID
>> in that list if the list size exceeds CHILD_MAX.

> The AIX 6.1 I've debugged on has:
>   #define CHILD_MAX 128

> I'm going to run this build job with 'truss -t kfork' again, to eventually 
> find
> some too small count of different PIDs before PID-recycling by the kernel...

Tracing shows:

The minimum fork count (including grand-childs to any depth) before PID 
recycling starts
looks like 255 (once), but usually 256 and more.

However, one process does see a PID recycled after *at least* 128 forks,
that is exactly the value of CHILD_MAX.

First thought is of some off-by-one bug, but reducing js.c_childmax in jobs.c 
(2 times)
by one doesn't help.

Investigating further... any hints what to look out for?

/haubi/



Re: AIX and Interix also do early PID recycling.

2012-07-25 Thread Michael Haubenwallner

On 07/25/2012 02:14 PM, Greg Wooledge wrote:
> On Wed, Jul 25, 2012 at 09:59:28AM +0200, Michael Haubenwallner wrote:
>> OTOH, AFAICT, as long as a PID isn't waitpid()ed for, it isn't reused by 
>> fork().
>> However, I'm unable to find that in the POSIX spec.
> 
> A process that hasn't been waited for should become a zombie, which
> should be sufficient to prevent its PID being reused.  Are you saying
> that AIX and Interix don't have zombies?

Nope. My thought was that bash eventually could postpone waiting for a specific
child PID until required by the driving shell script. That is: immediately for
synchronous childs to set $?, and on "wait" for asynchronous childs. The idea
was to render storing CHILD_MAX returnvalues obsolete.
However, I'm investigating why respecting CHILD_MAX by bash doesn't work when
the kernel starts reusing PIDs after CHILD_MAX different ones.

/haubi/




Re: AIX and Interix also do early PID recycling.

2012-07-25 Thread Michael Haubenwallner

On 07/25/2012 03:20 PM, Michael Haubenwallner wrote:
> On 07/25/2012 09:59 AM, Michael Haubenwallner wrote:
>> On 07/25/2012 03:05 AM, Chet Ramey wrote:
>>> Bash holds on to the status of all terminated processes, not just
>>> background ones, and only checks for the presence of a newly-forked PID
>>> in that list if the list size exceeds CHILD_MAX.
> 
>> The AIX 6.1 I've debugged on has:
>>   #define CHILD_MAX 128
>>   #define _POSIX_CHILD_MAX 25
>>   sysconf(_SC_CHILD_MAX) = 1024

> Tracing shows:
> 
> The minimum fork count (including grand-childs to any depth) before PID 
> recycling starts
> looks like 255 (once), but usually 256 and more.
> 
> However, one process does see a PID recycled after *at least* 128 forks,
> that is exactly the value of CHILD_MAX.

Got it: The value used for js.c_childmax isn't 128, but 1024.

In lib/sh/oslib.c, getmaxchild() prefers sysconf(_SC_CHILD_MAX) over CHILD_MAX 
over MAXUPRC.

But sysconf(_SC_CHILD_MAX) does return the number of "processes per real user 
id" (similar to
ulimit -u), rather than the number of CHILD_MAX (whenever defined).

For Interix, things are different though:
There is no CHILD_MAX nor MAXUPRC defined, and sysconf(_SC_CHILD_MAX) does 
return 512,
but PIDs start to be recycled at ~120 already...

Any idea about the "correct" fix for getmaxchild() across platforms?

/haubi/



Re: AIX and Interix also do early PID recycling.

2012-07-25 Thread Michael Haubenwallner

On 07/25/2012 04:50 PM, Chet Ramey wrote:
>> The AIX 6.1 I've debugged on has:
>>   #define CHILD_MAX 128
>>   #define _POSIX_CHILD_MAX 25
>>   sysconf(_SC_CHILD_MAX) = 1024

> Bash prefers sysconf(_SC_CHILD_MAX) and will use it over the other
> defines (lib/sh/oslib.c:getmaxchild()).  I don't know why AIX chooses
> to return a different value via sysconf than it defines for CHILD_MAX,
> especially when it seems to use the CHILD_MAX value to decide when it
> can recycle the PID space.

Well, _SC_CHILD_MAX is documented across platforms as:
(Linux)   "The max number of simultaneous processes per user ID."
(HP-UX)   "Maximum number of simultaneous processes per user ID."
(Solaris) "Max processes allowed to a UID"
(AIX) "Specifies the number of simultaneous processes per real user ID."
(Interix) "Maximum number of simultaneous processes per user ID."

Also, one Linux machine actually shows the _SC_CHILD_MAX value equal to
kernel.pid_max (32768 here), so even Linux could see this problem in theory,
because PIDs really are recycled before kernel.pid_max.

> And I suspect that the single change of significance is to not check
> against the childmax value when deciding whether or not to look for and
> remove this pid from the list of saved termination status values.

Agreed - but is this still different to defining RECYCLES_PIDS then?

Thank you!
/haubi/



Re: AIX and Interix also do early PID recycling.

2012-07-26 Thread Michael Haubenwallner


On 07/25/12 19:06, Chet Ramey wrote:

Well, _SC_CHILD_MAX is documented across platforms as:


Heck, even POSIX specifies CHILD_MAX as:
"Maximum number of simultaneous processes per real user ID."


Also, one Linux machine actually shows the _SC_CHILD_MAX value equal to
kernel.pid_max (32768 here),


That's interesting, since Posix describes sysconf() as simply a way to
retrieve values from limits.h or unistd.h that one wishes to get at
run time rather than compile time.  And interesting that it establishes a
correspondence between CHILD_MAX and _SC_CHILD_MAX.


There's this one sentence in sysconf spec:
  The value returned shall not be more restrictive than the corresponding
  value described to the application when it was compiled with the
  implementation's  or .

So CHILD_MAX is the /minimum/ value sysconf(_SC_CHILD_MAX) may return.


And I suspect that the single change of significance is to not check
against the childmax value when deciding whether or not to look for and
remove this pid from the list of saved termination status values.


Agreed - but is this still different to defining RECYCLES_PIDS then?


It is not.  It is one of the things that happens when you define
RECYCLES_PIDS.  The question is whether or not that is the single thing
that makes a difference in this case.  If it is, there is merit in
removing the check against js.c_childmax entirely or making it dependent
on something else.


IMO, checking against js.c_childmax (sysconf's value) still makes sense
to have some upper limit, while being large enough to be useful.
However, defining the "useful" value is up to the kernel, which does
guarantee for static CHILD_MAX (or _POSIX_CHILD_MAX) at least, while
providing more than 100 in practice across various platforms.

However, having the "useful" value unavailable to bash feels like
rendering the RECYCLES_PIDS-implementation mandatory for /any/ platform.

/haubi/



Re: AIX and Interix also do early PID recycling.

2012-07-26 Thread Michael Haubenwallner


On 07/26/12 20:29, Chet Ramey wrote:

OK, we have some data, we have a hypothesis, and we have a way to test it.
Let's test it.

Michael, please apply the attached patch, disable RECYCLES_PIDS, and run
your tests again.  This makes the check for previously-saved exit statuses
unconditional.

Let's see if this is the one change of significance.


Nope, doesn't fix the problem, even if it might be necessary though
to not mix up stored exitstates.

Somehow this is related to last_made_pid being preserved across childs
created for { $() } or { `` }.

In execute_command_internal(), last_made_pid still holds the 128 forks
old (first) PID, causing wait_for() to be not run when getting the same
PID by execute_simple_command() again.

However, I've been able to create a short testcase now:

---
#! /bin/bash

/bin/false # make first child

for x in {1..127}; do
  x=$( : ) # make CHILD_MAX-1 more childs
done

# breaks when first child's PID is recycled here
if /bin/false; then
  echo BOOM
  exit 1
fi

echo GOOD
---

/haubi/




Re: AIX and Interix also do early PID recycling.

2012-07-27 Thread Michael Haubenwallner

On 07/26/2012 11:37 PM, Michael Haubenwallner wrote:
> On 07/26/12 20:29, Chet Ramey wrote:
>> OK, we have some data, we have a hypothesis, and we have a way to test it.
>> Let's test it.
>>
>> Michael, please apply the attached patch, disable RECYCLES_PIDS, and run
>> your tests again.  This makes the check for previously-saved exit statuses
>> unconditional.
>>
>> Let's see if this is the one change of significance.
> 
> Nope, doesn't fix the problem, even if it might be necessary though
> to not mix up stored exitstates.

For mixing up stored exitstates: This patch isn't enough to get below testcase
working reliably - it also is necessary to drop the pid_wrap detection, as pids
aren't guaranteed to be (re)used in any particular order. However, this highly
depends on the machine's load.

With attached patch I haven't been able to break the testcase below so far
on that AIX 6.1 box here.

But still, the other one using the $()-childs still fails.

---
for job in {128..511} {0..127}
do
  if [[ ${job} -lt 128 ]]; then
( exit 17 ) & 
  else  
( exit 1 ) &
  fi
  eval "pidof_${job}=\$!"
done

for job in {127..0}; do
  pid=pidof_${job}
  pid=${!pid}

  wait ${pid}
  ret=$?

  if [ ${ret} -ne 17 ]; then
echo "job ${job} failed with ret ${ret}" 
  fi
done
---

Thank you!
/haubi/ (away for next 3 weeks)
*** jobs.c.orig	2012-07-27 15:29:54.283862562 +0200
--- jobs.c	2012-07-27 15:29:51.960238374 +0200
***
*** 1897,1903 
--- 1897,1906 
  #endif
  
+ #if 0
if (pid_wrap > 0)
+ #endif
  	delete_old_job (pid);
  
+ #if 0
  #if !defined (RECYCLES_PIDS)
/* Only check for saved status if we've saved more than CHILD_MAX
***
*** 1905,1908 
--- 1908,1912 
if ((js.c_reaped + bgpids.npid) >= js.c_childmax)
  #endif
+ #endif
  	bgp_delete (pid);		/* new process, discard any saved status */
  


Re: AIX and Interix also do early PID recycling.

2012-08-20 Thread Michael Haubenwallner


On 07/29/2012 12:46 AM, Chet Ramey wrote:
> On 7/27/12 9:50 AM, Michael Haubenwallner wrote:
> 
>> With attached patch I haven't been able to break the testcase below so far
>> on that AIX 6.1 box here.
>>
>> But still, the other one using the $()-childs still fails.
> 
> Try the attached patch for that.

Collecting the patches and cleaning up now unused code, attached patch
seems to fix both CHILD_MAX related problems on that AIX box here now,
without using the RECYCLES_PIDS workaround.

Thank you!

/haubi/
Bash assumes pids aren't reused before sysconf(_SC_CHILD_MAX) immediate childs
(the dynamic value), as well as ascending and wrapped around pid values.

However, as specified by POSIX, conforming kernels actually guarantee for
CHILD_MAX imediate childs (the static value) before reusing pids. Additionally,
AIX (at least) does not guarantee for ascending pid values at all. Actually,
AIX reuses pids after its CHILD_MAX value of 128 in somewhat random order in
some configuration- or load-cases, resulting in race conditions like these:
http://lists.gnu.org/archive/html/bug-bash/2008-07/msg00117.html

This looks like a similar problem with Cygwin, where RECYCLES_PIDS is defined
as the workaround, but that isn't really correct for AIX (and maybe Interix):
http://www.cygwin.com/ml/cygwin/2004-09/msg00882.html
http://www.cygwin.com/ml/cygwin/2002-08/msg00449.html
*** jobs.c.orig	2012-08-20 16:23:51 +0200
--- jobs.c	2012-08-20 16:51:36 +0200
***
*** 317,324 
  static char retcode_name_buffer[64];
  
- /* flags to detect pid wraparound */
- static pid_t first_pid = NO_PID;
- static int pid_wrap = -1;
- 
  #if !defined (_POSIX_VERSION)
  
--- 317,320 
***
*** 347,352 
  {
js = zerojs;
-   first_pid = NO_PID;
-   pid_wrap = -1;
  }
  
--- 343,346 
***
*** 1823,1833 
  	 as the proper pgrp if this is the first child. */
  
-   if (first_pid == NO_PID)
- 	first_pid = pid;
-   else if (pid_wrap == -1 && pid < first_pid)
- 	pid_wrap = 0;
-   else if (pid_wrap == 0 && pid >= first_pid)
- 	pid_wrap = 1;
- 
if (job_control)
  	{
--- 1817,1820 
***
*** 1863,1875 
  #endif
  
!   if (pid_wrap > 0)
! 	delete_old_job (pid);
  
! #if !defined (RECYCLES_PIDS)
!   /* Only check for saved status if we've saved more than CHILD_MAX
! 	 statuses, unless the system recycles pids. */
!   if ((js.c_reaped + bgpids.npid) >= js.c_childmax)
! #endif
! 	bgp_delete (pid);		/* new process, discard any saved status */
  
last_made_pid = pid;
--- 1850,1856 
  #endif
  
!   delete_old_job (pid);
  
!   bgp_delete (pid);		/* new process, discard any saved status */
  
last_made_pid = pid;
*** execute_cmd.c.orig	2012-08-20 16:36:10 +0200
--- execute_cmd.c	2012-08-20 16:51:14 +0200
***
*** 742,748 
  
  	/* XXX - this is something to watch out for if there are problems
! 	   when the shell is compiled without job control. */
! 	if (already_making_children && pipe_out == NO_PIPE &&
! 	last_made_pid != last_pid)
  	  {
  	stop_pipeline (asynchronous, (COMMAND *)NULL);
--- 742,750 
  
  	/* XXX - this is something to watch out for if there are problems
! 	   when the shell is compiled without job control.  Don't worry about
! 	   whether or not last_made_pid == last_pid; already_making_children
! 	   tells us whether or not there are unwaited-for children to wait
! 	   for and reap. */
! 	if (already_making_children && pipe_out == NO_PIPE)
  	  {
  	stop_pipeline (asynchronous, (COMMAND *)NULL);


Re: AIX and Interix also do early PID recycling.

2012-08-29 Thread Michael Haubenwallner

On 08/28/2012 09:21 AM, Roman Rakus wrote:
> On 08/01/2012 03:13 PM, Chet Ramey wrote:
>> On 7/30/12 10:41 AM, Roman Rakus wrote:
>>
>>> Hmm... I don't know much about boundaries of maximum number of user
>>> processes. But anyway - do you think that (re)changing js.c_childmax (when
>>> `ulimit -u' is changed) is not good?
>> Maybe it's ok up to some fixed upper bound.  But if you're going to have
>> that fixed upper bound, why not just use it as the number of job exit
>> statuses to remember all the time?
>>
> I prepared a patch which add configure option to enable and set the number of 
> job exit statuses to remember.

Why not simply use the static CHILD_MAX value instead?
Feels like this is what the spec means - and conforming kernels do not 
guarantee for more
than that anyway, counting synchronous, asynchronous and substituted commands 
together.

However, Linux has stopped defining CHILD_MAX (not so) recently (value was 999),
so _POSIX_CHILD_MAX (25 is current value, 6 is old value) would feel correct 
then...

Anyway, now I do understand why people use pipes instead to get the child's 
exitstatus:
http://thread.gmane.org/gmane.linux.gentoo.portage.devel/3446/focus=3451

/haubi/



[PATCH] Fix process substitution with named pipes.

2013-10-31 Thread Michael Haubenwallner
When /dev/fd is missing, and named pipes are used instead (like on AIX),
this snippet sometimes does work right, wrong, or hang - depending on
the operating system's process scheduler timing:

  for x in {0..9}; do echo $x; done > >(
cnt=0; while read line; do let cnt=cnt+1; done; echo $cnt
  )

To reproduce this problem on Linux, add this line to subst.c to enforce
the problematic timing behaviour:
  #if !defined (HAVE_DEV_FD)
 +  sleep(1);
fifo_list[nfifo-1].proc = pid;
  #endif
and enforce using named pipes: bash_cv_dev_fd=absent ./configure ...
---
 subst.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/subst.c b/subst.c
index 48c89c1..afae3b7 100644
--- a/subst.c
+++ b/subst.c
@@ -5075,7 +5075,7 @@ process_substitute (string, open_for_read_in_child)
 
 #if !defined (HAVE_DEV_FD)
   /* Open the named pipe in the child. */
-  fd = open (pathname, open_for_read_in_child ? O_RDONLY|O_NONBLOCK : 
O_WRONLY);
+  fd = open (pathname, open_for_read_in_child ? O_RDONLY : O_WRONLY);
   if (fd < 0)
 {
   /* Two separate strings for ease of translation. */
-- 
1.8.1.5




Re: Weird process substitution behavior

2013-11-15 Thread Michael Haubenwallner

On 11/14/2013 08:56 PM, Chet Ramey wrote:
> On 11/8/13 6:26 PM, John Dawson wrote:
>> The following surprised me. I thought line 4 of the output, and certainly
>> line 5 of the output, should have said "0 /dev/fd/63" too. Is this behavior
>> a bug?
> 
> I'm still looking at this.  I have not had a great deal of time to
> investigate.

Maybe interesting:
With named pipes (some non-Linux platforms) this does hang after the first line.

Not sure which behaviour actually is the Right Thing though.

/haubi/



Re: Pb bash with process substitution on AIX : compilation logs for bash 4.2

2013-11-29 Thread Michael Haubenwallner
Hi!

On 11/28/2013 02:32 PM, Flene TOUMANI wrote:
> Is it possible to get a feedback on the issue? (E.g. a confirmation that this 
> is a bug).

Sounds like you've run into this problem (patch available):
http://lists.gnu.org/archive/html/bug-bash/2013-10/msg00114.html

/haubi/



bash-shipped getcwd() replacement does not work on interix.

2007-12-20 Thread Michael Haubenwallner
Machine: i586
OS: interix5.2
Compiler: gcc 
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i586'
-DCONF_OSTYPE='interix5.2' -DCONF_MACHTYPE='i586-pc-interix5.2'
-DCONF_VENDOR='pc'
-DLOCALEDIR='/tools/snapshot/prefix-launcher-1pre.20071219/i586-pc-interix5.2/share/locale'
 -DPACKAGE='bash' 
-DLOCAL_PREFIX=/tools/snapshot/prefix-launcher-1pre.20071219/i586-pc-interix5.2 
-DSHELL -DHAVE_CONFIG_H -DNO_MAIN_ENV_ARG -DBROKEN_DIRENT_D_INO -D_POSIX_SOURCE 
  -I.  -I/tss/prefix-launcher-1pre.20071219/buildroot/bash/bash-3.2 
-I/tss/prefix-launcher-1pre.20071219/buildroot/bash/bash-3.2/include 
-I/tss/prefix-launcher-1pre.20071219/buildroot/bash/bash-3.2/lib   -g -O2
uname output: Interix pc312001 5.2 SP-9.0.3790.3034 x86
Intel_x86_Family6_Model15_Stepping6
Machine Type: i586-pc-interix5.2

Bash Version: 3.2 
Patch Level: 33
Release Status: release

Description:
Bash uses getcwd-replacement if libc provides getcwd without the
feature of allocating the buffer when called without one.
This override is done in config-bot.h, with an exception for
solaris already.
Problem now is that getcwd-replacement does not work on Interix
(SUA 5.2 here).
Now there's only one source location in builtins/common.c really
relying on getcwd(0,0) allocating the buffer.
But even here is some conditional code on GETCWD_BROKEN.
So why not simply don't require allocation-feature of getcwd at all
and use getcwd-replacement only if libc does not provide one ?

Repeat-By:
$ PWD= 
/tools/snapshot/prefix-launcher-1pre.20071219/i586-pc-interix5.2/bin/bash
shell-init: error retrieving current directory: getcwd: cannot access 
parent directories: No such file or directory

Fix: (patch attached)
builtins/common.c:
   Do not depend on getcwd() doing buffer allocation.
config-bot.h:
   Ignore GETCWD_BROKEN, keep HAVE_GETCWD as is.
Additionally, the check for GETCWD_BROKEN can be dropped
    from configure.in and aclocal.m4.

Thanks!

/haubi/
-- 
Michael Haubenwallner
Gentoo on a different level
diff -ru builtins/common.c builtins/common.c
--- builtins/common.c	Wed Dec 19 10:30:07 2007
+++ builtins/common.c	Wed Dec 19 10:34:58 2007
@@ -479,11 +479,8 @@
 
   if (the_current_working_directory == 0)
 {
-#if defined (GETCWD_BROKEN)
-  the_current_working_directory = getcwd (0, PATH_MAX);
-#else
-  the_current_working_directory = getcwd (0, 0);
-#endif
+  char *t = xmalloc(PATH_MAX);
+  the_current_working_directory = getcwd (t, PATH_MAX);
   if (the_current_working_directory == 0)
 	{
 	  fprintf (stderr, _("%s: error retrieving current directory: %s: %s\n"),
diff -ru config-bot.h config-bot.h
--- config-bot.h	Wed Dec 19 10:30:06 2007
+++ config-bot.h	Wed Dec 19 10:31:16 2007
@@ -70,14 +70,6 @@
 #  define TERMIOS_MISSING
 #endif
 
-/* If we have a getcwd(3), but one that does not dynamically allocate memory,
-   #undef HAVE_GETCWD so the replacement in getcwd.c will be built.  We do
-   not do this on Solaris, because their implementation of loopback mounts
-   breaks the traditional file system assumptions that getcwd uses. */
-#if defined (HAVE_GETCWD) && defined (GETCWD_BROKEN) && !defined (SOLARIS)
-#  undef HAVE_GETCWD
-#endif
-
 #if !defined (HAVE_DEV_FD) && defined (NAMED_PIPES_MISSING)
 #  undef PROCESS_SUBSTITUTION
 #endif
diff -ru configure.in configure.in
--- configure.in	Wed Dec 19 10:30:09 2007
+++ configure.in	Wed Dec 19 10:37:08 2007
@@ -894,9 +894,6 @@
 BASH_FUNC_OPENDIR_CHECK
 BASH_FUNC_ULIMIT_MAXFDS
 BASH_FUNC_GETENV
-if test "$ac_cv_func_getcwd" = "yes"; then
-BASH_FUNC_GETCWD
-fi
 BASH_FUNC_POSIX_SETJMP
 BASH_FUNC_STRCOLL
 
diff -ru aclocal.m4 aclocal.m4
--- aclocal.m4	Tue Sep 12 23:18:07 2006
+++ aclocal.m4	Wed Dec 19 10:37:33 2007
@@ -684,32 +684,6 @@
 fi
 ])
 
-AC_DEFUN(BASH_FUNC_GETCWD,
-[AC_MSG_CHECKING([if getcwd() will dynamically allocate memory])
-AC_CACHE_VAL(bash_cv_getcwd_malloc,
-[AC_TRY_RUN([
-#include 
-#ifdef HAVE_UNISTD_H
-#include 
-#endif
-
-main()
-{
-	char	*xpwd;
-	xpwd = getcwd(0, 0);
-	exit (xpwd == 0);
-}
-], bash_cv_getcwd_malloc=yes, bash_cv_getcwd_malloc=no,
-   [AC_MSG_WARN(cannot check whether getcwd allocates memory when cross-compiling -- defaulting to no)
-bash_cv_getcwd_malloc=no]
-)])
-AC_MSG_RESULT($bash_cv_getcwd_malloc)
-if test $bash_cv_getcwd_malloc = no; then
-AC_DEFINE(GETCWD_BROKEN)
-AC_LIBOBJ(getcwd)
-fi
-])
-
 dnl
 dnl This needs BASH_CHECK_SOCKLIB, but since that's not called on every
 dnl system, we can't use AC_PREREQ


Re: bash-shipped getcwd() replacement does not work on interix.

2007-12-20 Thread Michael Haubenwallner
On Thu, 2007-12-20 at 12:30 +0100, Andreas Schwab wrote:
> Michael Haubenwallner <[EMAIL PROTECTED]> writes:
> 
> > diff -ru builtins/common.c builtins/common.c
> > --- builtins/common.c   Wed Dec 19 10:30:07 2007
> > +++ builtins/common.c   Wed Dec 19 10:34:58 2007
> > @@ -479,11 +479,8 @@
> >  
> >if (the_current_working_directory == 0)
> >  {
> > -#if defined (GETCWD_BROKEN)
> > -  the_current_working_directory = getcwd (0, PATH_MAX);
> > -#else
> > -  the_current_working_directory = getcwd (0, 0);
> > -#endif
> > +  char *t = xmalloc(PATH_MAX);
> > +  the_current_working_directory = getcwd (t, PATH_MAX);
> 
> The length of the cwd may be bigger than PATH_MAX.

Eventually - but there are three (ok, two) other locations in bash-3.2
where buffer[PATH_MAX] is passed to getcwd():

1) jobs.c:
 current_working_directory()
2) parse.y:
 decode_prompt_string()

3) lib/readline/examples/fileman.c:
 com_pwd()
 ok, this just is an example, and uses 1024 instead of PATH_MAX.

Instead of using PATH_MAX, why not have some xgetcwd() instead, doing
malloc (when getcwd does not allocate itself), and increase the buffer
when getcwd() returns ERANGE ?

/haubi/
-- 
Michael Haubenwallner
Gentoo on a different level





Re: bash-shipped getcwd() replacement does not work on interix.

2007-12-21 Thread Michael Haubenwallner
On Thu, 2007-12-20 at 08:08 -0500, Chet Ramey wrote:
> Michael Haubenwallner wrote:
> > Machine: i586
> > OS: interix5.2
> > Compiler: gcc 
> > Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i586'
> > -DCONF_OSTYPE='interix5.2' -DCONF_MACHTYPE='i586-pc-interix5.2'
> > -DCONF_VENDOR='pc'
> > -DLOCALEDIR='/tools/snapshot/prefix-launcher-1pre.20071219/i586-pc-interix5.2/share/locale'
> >  -DPACKAGE='bash' 
> > -DLOCAL_PREFIX=/tools/snapshot/prefix-launcher-1pre.20071219/i586-pc-interix5.2
> >  -DSHELL -DHAVE_CONFIG_H -DNO_MAIN_ENV_ARG -DBROKEN_DIRENT_D_INO 
> > -D_POSIX_SOURCE   -I.  
> > -I/tss/prefix-launcher-1pre.20071219/buildroot/bash/bash-3.2 
> > -I/tss/prefix-launcher-1pre.20071219/buildroot/bash/bash-3.2/include 
> > -I/tss/prefix-launcher-1pre.20071219/buildroot/bash/bash-3.2/lib   -g -O2
> > uname output: Interix pc312001 5.2 SP-9.0.3790.3034 x86
> > Intel_x86_Family6_Model15_Stepping6
> > Machine Type: i586-pc-interix5.2
> > 
> > Bash Version: 3.2 
> > Patch Level: 33
> > Release Status: release
> > 
> > Description:
> > Bash uses getcwd-replacement if libc provides getcwd without the
> > feature of allocating the buffer when called without one.
> > This override is done in config-bot.h, with an exception for
> > solaris already.
> > Problem now is that getcwd-replacement does not work on Interix
> > (SUA 5.2 here).
> 
> I'd be more interested in knowing why it doesn't work in this case,
> instead of discarding it.  Since I neither have nor use Interix, I
> need someone who does to investigate the issue a little bit.

It is because readdir() returns 0 (zero) for (struct dirent).(d_ino),
while stat() returns useful values for (struct stat).(st_ino), so their 
equal-comparison never succeeds.

Now, while trying to get inode number from stat() rather than readdir(),
I've seen another bug unrelated to readdir()/stat(), but still in
getcwd() replacement, causing a coredump here.

It is with the memcpy() from the internal buffer to the allocated return
buffer, but only when there is a minimal buffer size specified - wth. is
this done in get_working_directory() when GETCWD_BROKEN is defined:
Does Solaris (config-bot.h) allocate the buffer when a size is passed ?

Attached patch fixes this one issue, by still allocating at least
provided buffer size, but doing the memcpy with real path length.
When done with buffer size, memcpy reads beyond the end of the source
buffer on the stack. The SIGSEGV was caused here because it has read
beyond the whole stack frame page.

/haubi/
-- 
Michael Haubenwallner
Gentoo on a different level
--- lib/sh/getcwd.c.orig	Fri Dec 21 11:34:00 2007
+++ lib/sh/getcwd.c	Fri Dec 21 11:58:41 2007
@@ -252,9 +256,9 @@
 size_t len = pathbuf + pathsize - pathp;
 if (buf == NULL)
   {
-	if (len < (size_t) size)
-	  len = size;
-	buf = (char *) malloc (len);
+	if (len > (size_t) size)
+	  size = len;
+	buf = (char *) malloc (size);
 	if (buf == NULL)
 	  goto lose2;
   }


Re: bash-shipped getcwd() replacement does not work on interix.

2007-12-21 Thread Michael Haubenwallner
On Fri, 2007-12-21 at 13:51 +0100, Michael Haubenwallner wrote:
> On Thu, 2007-12-20 at 08:08 -0500, Chet Ramey wrote:
> > Michael Haubenwallner wrote:
> > > Machine: i586
> > > OS: interix5.2
> > > Compiler: gcc 
> > > Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i586'
> > > -DCONF_OSTYPE='interix5.2' -DCONF_MACHTYPE='i586-pc-interix5.2'
> > > -DCONF_VENDOR='pc'
> > > -DLOCALEDIR='/tools/snapshot/prefix-launcher-1pre.20071219/i586-pc-interix5.2/share/locale'
> > >  -DPACKAGE='bash' 
> > > -DLOCAL_PREFIX=/tools/snapshot/prefix-launcher-1pre.20071219/i586-pc-interix5.2
> > >  -DSHELL -DHAVE_CONFIG_H -DNO_MAIN_ENV_ARG -DBROKEN_DIRENT_D_INO 
> > > -D_POSIX_SOURCE   -I.  
> > > -I/tss/prefix-launcher-1pre.20071219/buildroot/bash/bash-3.2 
> > > -I/tss/prefix-launcher-1pre.20071219/buildroot/bash/bash-3.2/include 
> > > -I/tss/prefix-launcher-1pre.20071219/buildroot/bash/bash-3.2/lib   -g -O2
> > > uname output: Interix pc312001 5.2 SP-9.0.3790.3034 x86
> > > Intel_x86_Family6_Model15_Stepping6
> > > Machine Type: i586-pc-interix5.2
> > > 
> > > Bash Version: 3.2 
> > > Patch Level: 33
> > > Release Status: release
> > > 
> > > Description:
> > > Bash uses getcwd-replacement if libc provides getcwd without the
> > > feature of allocating the buffer when called without one.
> > > This override is done in config-bot.h, with an exception for
> > > solaris already.
> > > Problem now is that getcwd-replacement does not work on Interix
> > > (SUA 5.2 here).
> > 
> > I'd be more interested in knowing why it doesn't work in this case,
> > instead of discarding it.  Since I neither have nor use Interix, I
> > need someone who does to investigate the issue a little bit.
> 
> It is because readdir() returns 0 (zero) for (struct dirent).(d_ino),
> while stat() returns useful values for (struct stat).(st_ino), so their 
> equal-comparison never succeeds.

Attached patch should fix this issue, not relying on readdir() returning
valid d_ino, but doing stat() always instead.

But eventually there should be a configure-check or sth. like that if
readdir returns valid d_ino, and subsequently avoid the additional stat.

Moving alloca() into separate function was necessary because there is no
realloca() or sth. like that, and wasting stack for each iteration is
bad.

/haubi/
-- 
Michael Haubenwallner
Gentoo on a different level





Re: bash-shipped getcwd() replacement does not work on interix.

2007-12-22 Thread Michael Haubenwallner
On Sat, 2007-12-22 at 10:13 -0500, Chet Ramey wrote:
> Michael Haubenwallner wrote:
> >> It is because readdir() returns 0 (zero) for (struct dirent).(d_ino),
> >> while stat() returns useful values for (struct stat).(st_ino), so their 
> >> equal-comparison never succeeds.
> > 
> > Attached patch should fix this issue, not relying on readdir() returning
> > valid d_ino, but doing stat() always instead.
> 
> You didn't attach one.

Uh oh, indeed, sorry. Here it is.

/haubi/
-- 
Michael Haubenwallner
Gentoo on a different level
--- lib/sh/getcwd.c.orig	Fri Dec 21 11:34:00 2007
+++ lib/sh/getcwd.c	Fri Dec 21 14:37:57 2007
@@ -58,6 +58,24 @@
 #  define NULL 0
 #endif
 
+static int concat_path_and_stat(char *dotp, size_t dotlen,
+	char *nam, size_t namlen,
+	struct stat *st, char mount_point, ino_t thisino,
+	int *saved_errno)
+{
+  char *name;
+  name = alloca(dotlen + 1 + namlen + 1);
+  memcpy(name, dotp, dotlen);
+  name[dotlen] = '/';
+  memcpy(&name[dotlen+1], nam, namlen+1);
+  if (stat(name, st) < 0)
+return -1;
+  if (mount_point || st->st_ino == thisino)
+  if (lstat(name, st) < 0)
+	  *saved_errno = errno;
+  return 0;
+}
+
 /* Get the pathname of the current working directory,
and put it in SIZE bytes of BUF.  Returns NULL if the
directory couldn't be determined or SIZE was too small.
@@ -169,31 +187,15 @@
 	  (d->d_name[1] == '\0' ||
 		(d->d_name[1] == '.' && d->d_name[2] == '\0')))
 	continue;
-	  if (mount_point || d->d_fileno == thisino)
-	{
-	  char *name;
-
-	  namlen = D_NAMLEN(d);
-	  name = (char *)
-		alloca (dotlist + dotsize - dotp + 1 + namlen + 1);
-	  memcpy (name, dotp, dotlist + dotsize - dotp);
-	  name[dotlist + dotsize - dotp] = '/';
-	  memcpy (&name[dotlist + dotsize - dotp + 1],
-		  d->d_name, namlen + 1);
-	  if (lstat (name, &st) < 0)
-		{
-#if 0
-		  int save = errno;
-		  (void) closedir (dirstream);
-		  errno = save;
-		  goto lose;
-#else
-		  saved_errno = errno;
-#endif
-		}
-	  if (st.st_dev == thisdev && st.st_ino == thisino)
-		break;
-	}
+	namlen = D_NAMLEN(d);
+	if (concat_path_and_stat(dotp, dotlist + dotsize - dotp,
+		d->d_name, namlen,
+		&st, mount_point, thisino,
+		&saved_errno
+	) < 0)
+		goto lose;
+	if (st.st_dev == thisdev && st.st_ino == thisino)
+	  break;
 	}
   if (d == NULL)
 	{


Re: bash's own getcwd reads uninitialized/nonexistent memory

2008-01-24 Thread Michael Haubenwallner
On Wed, 2008-01-23 at 17:45 +0100, Philippe De Muyter wrote:

> here is a patch :

LOL - this is a very similar patch as
http://lists.gnu.org/archive/html/bug-bash/2007-12/msg00084.html

/haubi/
-- 
Michael Haubenwallner
Gentoo on a different level





[bash-3.2.39] race condition on AIX when using libtool with bash

2008-07-30 Thread Michael Haubenwallner
Hi,

have some strange race condition here on aix5.3 with bash-3.2.39, when
using CONFIG_SHELL=/path/to/bash, building in parallel (-j16) with
libtool. It works when using /bin/ksh.

Sporadically there are two lines missing in the libtool-generated
'file.lo', while other files in the same build directory contain these
lines. The broken files are different ones amongst different runs of the
same package build - and sometimes it does not happen at all.

The missing lines are from this command sequence in libtool-1.5.26:

http://git.savannah.gnu.org/gitweb/?p=libtool.git;a=blob;f=ltmain.in;h=48facb91640a8fd43ad0e7ce4139ec0ccb4bfa09;hb=branch-1-5#l1004
1004   test -z "$run" && cat >> ${libobj}T <http://git.savannah.gnu.org/gitweb/?p=libtool.git;a=blob;f=ltmain.in;h=48facb91640a8fd43ad0e7ce4139ec0ccb4bfa09;hb=branch-1-5#l1330
[3] 
http://git.savannah.gnu.org/gitweb/?p=libtool.git;a=blob;f=ltmain.in;h=48facb91640a8fd43ad0e7ce4139ec0ccb4bfa09;hb=branch-1-5#l1880


Any idea what could happen here?



To see what's going on I've already added this a few lines before:
truss -f -o ${libobj}.truss.out -p $$ &
sleep 1

Now I can see (stripped the unimportant):
open("GetWMCMapW.loT", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE) = 4
kfcntl(4, 14, 0x0001)   = 1
close(4)= 0
open("/tmp//sh-thd-1217607265", 
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE) = 4
kfcntl(4, F_DUPFD, 0x)  = 5
kwrite(5, " p i c _ o b j e c t = '".., 32) = 32
close(5)= 0
open("/tmp//sh-thd-1217607265", O_RDONLY|O_LARGEFILE) = 5
close(4)= 0
unlink("/tmp//sh-thd-1217607265")   = 0
kfcntl(5, 14, 0x)   = 0
close(5)= 0
execve("/usr/bin/cat", 0x200234E8, 0x20026968)  argc: 1
 argv: cat
kread(0, " p i c _ o b j e c t = '".., 4096) = 32
kwrite(1, " p i c _ o b j e c t = '".., 32) = 32
kread(0, " p i c _ o b j e c t = '".., 4096) = 0
close(1)= 0
_exit(0)

So I'm sure the missing commands above _are_ executed.


My speculation:
1) bash opens the here-document twice, first O_WRONLY, second O_RDONLY,
dup2'ing the second handle to stdin before doing exec('cat'), removing
the file immediately after the second open.

2) ksh opens the here-document only once, with O_RDWR, and dup's that
handle to stdin before doing exec('cat'), removing the file immediately
after the open.

Could one think of: when opening the file the second time, the content
of the first write isn't on-disk yet (because the content might fit into
some aix write buffer), or already removed by someone else in the
meantime, or something like that?

For completeness, this are the contents of one good and one broken
file.lo from the same build:

good$ cat DrArc.lo
# DrArc.lo - a libtool object file
# Generated by ltmain.sh - GNU libtool 1.5.26 (1.1220.2.493 2008/02/01 
16:58:18)
#
# Please DO NOT delete this file!
# It is necessary for linking the library.

# Name of the PIC object.
pic_object='.libs/DrArc.o'

# Name of the non-PIC object.
non_pic_object=none

bad$ cat GetWMCMapW.lo
# GetWMCMapW.lo - a libtool object file
# Generated by ltmain.sh - GNU libtool 1.5.26 (1.1220.2.493 2008/02/01 
16:58:18)
#
# Please DO NOT delete this file!
    # It is necessary for linking the library.

# Name of the PIC object.
# Name of the non-PIC object.
non_pic_object=none

Thoughts?

Thanks!

/haubi/
-- 
Michael Haubenwallner
Gentoo on a different level





Re: [bash-3.2.39] race condition on AIX when using libtool with bash

2008-07-31 Thread Michael Haubenwallner

On Wed, 2008-07-30 at 18:53 +0200, Michael Haubenwallner wrote:

> Now I can see (stripped the unimportant):
> open("GetWMCMapW.loT", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE) = 4
> kfcntl(4, 14, 0x0001)   = 1
> close(4)= 0
> open("/tmp//sh-thd-1217607265", 
> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE) = 4
> kfcntl(4, F_DUPFD, 0x)  = 5
> kwrite(5, " p i c _ o b j e c t = '".., 32) = 32
> close(5)= 0
> open("/tmp//sh-thd-1217607265", O_RDONLY|O_LARGEFILE) = 5
> close(4)= 0
> unlink("/tmp//sh-thd-1217607265")   = 0
> kfcntl(5, 14, 0x)   = 0
> close(5)= 0
> execve("/usr/bin/cat", 0x200234E8, 0x20026968)  argc: 1
>  argv: cat
> kread(0, " p i c _ o b j e c t = '".., 4096) = 32
> kwrite(1, " p i c _ o b j e c t = '".., 32) = 32
> kread(0, " p i c _ o b j e c t = '".., 4096) = 0
> close(1)= 0
> _exit(0)
> 
> So I'm sure the missing commands above _are_ executed.
> 
> 
> My speculation:
> 1) bash opens the here-document twice, first O_WRONLY, second O_RDONLY,
> dup2'ing the second handle to stdin before doing exec('cat'), removing
> the file immediately after the second open.
> 
> 2) ksh opens the here-document only once, with O_RDWR, and dup's that
> handle to stdin before doing exec('cat'), removing the file immediately
> after the open.

Forgot to mention that ksh does fseek() to zero before exec("cat").

> 
> Could one think of: when opening the file the second time, the content
> of the first write isn't on-disk yet (because the content might fit into
> some aix write buffer), or already removed by someone else in the
> meantime, or something like that?

Sorry, I've been wrong here: "cat" actually _can_ read the content, so
the double-open is not the problem here. Although IMHO it still might be
better for security and performance reasons to open() only once, and
fseek() to zero like ksh does.

/haubi/
-- 
Michael Haubenwallner
Gentoo on a different level





Re: [bash-3.2.39] race condition on AIX when using libtool with bash

2008-08-04 Thread Michael Haubenwallner

On Wed, 2008-07-30 at 18:53 +0200, Michael Haubenwallner wrote:
> Hi,
> 
> have some strange race condition here on aix5.3 with bash-3.2.39, when
> using CONFIG_SHELL=/path/to/bash, building in parallel (-j16) with
> libtool. It works when using /bin/ksh.

Now it has happened with /bin/ksh too, so this is not a bash problem at
all, sorry for the noise.

/haubi/
-- 
Michael Haubenwallner
Gentoo on a different level