wait unblocks before signals processed

2012-11-05 Thread Elliott Forney
While trying to modify some code I found on an earlier post for
running N jobs in parallel I came across the interesting behavior
illustrated below.  It appears that the wait command proceeds before
my SIGUSR's are all processed.  Is this a bug or just a fact of life?
I understand that it isn't possible to know if a process will receive
a signal in the future but I am surprised that the signals aren't
received and processed in time in this case.

On a related note, I think it would be very nice if there were a way
to wait for ANY background job to finish.  Currently it seems like one
can only wait for either ALL jobs or else a single job with a given
PID.  Would it be possible to have something like 'wait -' that would
block until any of the current background jobs completes?  This would
make writing simple parallel loops much easier.  The busy-wait/SIGUSR
solution is kindof a hack and for such a simple problem I would prefer
not to depend on gnu parallel.

#!/bin/bash

nrunning=0
nmax=3

function job_wrap
{
  echo "sleeping: $2 nrunning: $nrunning"
  eval "$@"
  kill -s USR2 $$
}

trap ': $(( --nrunning ))' USR2
for x in {1..20}
do
  while [[ nrunning -ge nmax ]]
  do
: # busy wait
  done

  : $(( ++nrunning ))
  job_wrap sleep $(( RANDOM % 3 )) &
done

echo 'start wait'
wait
trap - USR2
echo 'end wait'

$ ./par_sigusr
sleeping: 0 nrunning: 1
sleeping: 2 nrunning: 2
sleeping: 0 nrunning: 3
sleeping: 1 nrunning: 3
sleeping: 0 nrunning: 3
sleeping: 2 nrunning: 3
sleeping: 0 nrunning: 3
sleeping: 2 nrunning: 3
sleeping: 2 nrunning: 3
sleeping: 0 nrunning: 3
sleeping: 2 nrunning: 3
sleeping: 2 nrunning: 3
sleeping: 0 nrunning: 3
sleeping: 1 nrunning: 3
sleeping: 2 nrunning: 3
sleeping: 2 nrunning: 3
sleeping: 2 nrunning: 3
sleeping: 1 nrunning: 3
sleeping: 2 nrunning: 3
start wait
sleeping: 2 nrunning: 3
end wait
$ ./par_sigusr: line 10: kill: (16287) - No such process
./par_sigusr: line 10: kill: (16287) - No such process

Thanks!
---
Elliott Forney



Re: wait unblocks before signals processed

2012-11-05 Thread Elliott Forney
Of course, this code probably also has a race condition around
--nrunning which makes it even less usable.

Thanks,
---
Elliott ForneyE-Mail: id...@cs.colosetate.edu

On Mon, Nov 5, 2012 at 4:33 PM, Elliott Forney  wrote:
> While trying to modify some code I found on an earlier post for
> running N jobs in parallel I came across the interesting behavior
> illustrated below.  It appears that the wait command proceeds before
> my SIGUSR's are all processed.  Is this a bug or just a fact of life?
> I understand that it isn't possible to know if a process will receive
> a signal in the future but I am surprised that the signals aren't
> received and processed in time in this case.
>
> On a related note, I think it would be very nice if there were a way
> to wait for ANY background job to finish.  Currently it seems like one
> can only wait for either ALL jobs or else a single job with a given
> PID.  Would it be possible to have something like 'wait -' that would
> block until any of the current background jobs completes?  This would
> make writing simple parallel loops much easier.  The busy-wait/SIGUSR
> solution is kindof a hack and for such a simple problem I would prefer
> not to depend on gnu parallel.
>
> #!/bin/bash
>
> nrunning=0
> nmax=3
>
> function job_wrap
> {
>   echo "sleeping: $2 nrunning: $nrunning"
>   eval "$@"
>   kill -s USR2 $$
> }
>
> trap ': $(( --nrunning ))' USR2
> for x in {1..20}
> do
>   while [[ nrunning -ge nmax ]]
>   do
> : # busy wait
>   done
>
>   : $(( ++nrunning ))
>   job_wrap sleep $(( RANDOM % 3 )) &
> done
>
> echo 'start wait'
> wait
> trap - USR2
> echo 'end wait'
>
> $ ./par_sigusr
> sleeping: 0 nrunning: 1
> sleeping: 2 nrunning: 2
> sleeping: 0 nrunning: 3
> sleeping: 1 nrunning: 3
> sleeping: 0 nrunning: 3
> sleeping: 2 nrunning: 3
> sleeping: 0 nrunning: 3
> sleeping: 2 nrunning: 3
> sleeping: 2 nrunning: 3
> sleeping: 0 nrunning: 3
> sleeping: 2 nrunning: 3
> sleeping: 2 nrunning: 3
> sleeping: 0 nrunning: 3
> sleeping: 1 nrunning: 3
> sleeping: 2 nrunning: 3
> sleeping: 2 nrunning: 3
> sleeping: 2 nrunning: 3
> sleeping: 1 nrunning: 3
> sleeping: 2 nrunning: 3
> start wait
> sleeping: 2 nrunning: 3
> end wait
> $ ./par_sigusr: line 10: kill: (16287) - No such process
> ./par_sigusr: line 10: kill: (16287) - No such process
>
> Thanks!
> ---
> Elliott Forney



Re: wait unblocks before signals processed

2012-11-05 Thread Dan Douglas
Hi Elliott. The behavior of wait differs depending upon whether you are in 
POSIX mode. Try this script, which I think does essentially what you're after 
(also here: https://gist.github.com/3911059 ):

#!/usr/bin/env bash

${BASH_VERSION+shopt -s lastpipe extglob}

if [[ -v .sh.version ]]; then
builtin getconf
function BASHPID.get {
read -r .sh.value _ &2
sleep "$2"

printf '%d: returning %d\n' "$1" "$3" >&2
return "$3"
}

function main {
typeset -i n= j= maxj=$(getconf _NPROCESSORS_ONLN)

set -m
trap '((j--))' CHLD

while ((n++<30)); do
f "$BASHPID" $(((RANDOM%5)+1)) $((RANDOM%2)) &
((++j >= maxj)) && POSIXLY_CORRECT= wait
done

echo 'finished, waiting for remaining jobs...' >&2
wait
}

main "$@"
echo

# vim: set fenc=utf-8 ff=unix ts=4 sts=4 sw=4 ft=sh nowrap et:


The remaining issues are making it work in other shells (Bash in non-POSIX 
mode agrees with ksh, but ksh doesn't agree with POSIX), and also I can't 
think of a reasonable way to retrieve the exit statuses. The status of "wait" 
is rather useless here. Otherwise I think this is the best approach, using 
SIGCHLD and relying upon the POSIX wait behavior. See here: 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_11

An issue to be aware of is that the trap will fire when any child exits 
including command/process substitutions or pipelines etc. If any are located 
within the main loop then monitor mode needs to be toggled off around them.
-- 
Dan Douglas

signature.asc
Description: This is a digitally signed message part.


Re: wait unblocks before signals processed

2012-11-05 Thread Elliott Forney
OK, I see in POSIX mode that a trap on SIGCHLD will cause wait to
unblock.  We are still maintaining a counter of running jobs though so
it seems to me that there could race condition in the following line

trap '((j--))' CHLD

if two processes quit in rapid succession and one trap gets preempted
in the middle of ((j--)) then the count may be off.  Is this possible?

I tried to test whether or not traps are mutually exclusive with the
following code and got more interesting warnings.  The count appears
to suggest that there is indeed a race condition going on here but I
am unsure what "bad value in trap_list" means?

#!/bin/bash

count=0

function dummy
{
  usleep $RANDOM
}

set -m
trap ': $(( ++count ))' CHLD

for i in {1..1000}
do
  dummy $i &
done

wait

echo $count


$ ./trap_race
./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0
./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0
./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0
./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0
./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0
./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0
./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0
./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0
./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0
./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0
983

Thanks,
---
Elliott Forney


On Mon, Nov 5, 2012 at 5:11 PM, Dan Douglas  wrote:
> Hi Elliott. The behavior of wait differs depending upon whether you are in
> POSIX mode. Try this script, which I think does essentially what you're after
> (also here: https://gist.github.com/3911059 ):
>
> #!/usr/bin/env bash
>
> ${BASH_VERSION+shopt -s lastpipe extglob}
>
> if [[ -v .sh.version ]]; then
> builtin getconf
> function BASHPID.get {
> read -r .sh.value _  }
> fi
>
> function f {
> printf '%d: sleeping %d sec\n' "${@:1:2}" >&2
> sleep "$2"
>
> printf '%d: returning %d\n' "$1" "$3" >&2
> return "$3"
> }
>
> function main {
> typeset -i n= j= maxj=$(getconf _NPROCESSORS_ONLN)
>
> set -m
> trap '((j--))' CHLD
>
> while ((n++<30)); do
> f "$BASHPID" $(((RANDOM%5)+1)) $((RANDOM%2)) &
> ((++j >= maxj)) && POSIXLY_CORRECT= wait
> done
>
> echo 'finished, waiting for remaining jobs...' >&2
> wait
> }
>
> main "$@"
> echo
>
> # vim: set fenc=utf-8 ff=unix ts=4 sts=4 sw=4 ft=sh nowrap et:
>
>
> The remaining issues are making it work in other shells (Bash in non-POSIX
> mode agrees with ksh, but ksh doesn't agree with POSIX), and also I can't
> think of a reasonable way to retrieve the exit statuses. The status of "wait"
> is rather useless here. Otherwise I think this is the best approach, using
> SIGCHLD and relying upon the POSIX wait behavior. See here:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_11
>
> An issue to be aware of is that the trap will fire when any child exits
> including command/process substitutions or pipelines etc. If any are located
> within the main loop then monitor mode needs to be toggled off around them.
> --
> Dan Douglas



Re: wait unblocks before signals processed

2012-11-05 Thread Dan Douglas
On Monday, November 05, 2012 05:52:41 PM Elliott Forney wrote:
> OK, I see in POSIX mode that a trap on SIGCHLD will cause wait to
> unblock.  We are still maintaining a counter of running jobs though so
> it seems to me that there could race condition in the following line
> 
> trap '((j--))' CHLD
> 
> if two processes quit in rapid succession and one trap gets preempted
> in the middle of ((j--)) then the count may be off.  Is this possible?
> 

I believe that Bash guarantees the trap will run once for every child that 
exits, so it shoud be impossible for the count to become off. See: 
https://lists.gnu.org/archive/html/bug-bash/2012-05/msg00055.html

I think you might be experiencing other known bugs. Chet pushed several 
wait/job related commits within the last few weeks. I haven't tested these 
yet. http://git.savannah.gnu.org/cgit/bash.git/tree/CWRU/CWRU.chlog?h=devel
-- 
Dan Douglas



Re: wait unblocks before signals processed

2012-11-05 Thread Elliott Forney
> I believe that Bash guarantees the trap will run once for every child that
> exits, so it shoud be impossible for the count to become off. See:
> https://lists.gnu.org/archive/html/bug-bash/2012-05/msg00055.html

I guess my question is "can more than one trap run simultaneously?"
The more I think about it though, this is probably not possible.  It
looks like the trap doesn't run in a subprocess and I presume traps
are blocked inside of other traps.

> I think you might be experiencing other known bugs. Chet pushed several
> wait/job related commits within the last few weeks. I haven't tested these
> yet. http://git.savannah.gnu.org/cgit/bash.git/tree/CWRU/CWRU.chlog?h=devel

Sorry, I should look before posting.  I cloned the latest devel branch
of bash and now I see the following occasionally but it may still be a
work in progress.

$ ./trap_race
4.2.37(3)-maint
register_alloc: 0x9779a8 already in table as allocated?
register_alloc: 0x979378 already in table as allocated?
100

Thanks,
---
Elliott Forney