wait unblocks before signals processed
While trying to modify some code I found on an earlier post for running N jobs in parallel I came across the interesting behavior illustrated below. It appears that the wait command proceeds before my SIGUSR's are all processed. Is this a bug or just a fact of life? I understand that it isn't possible to know if a process will receive a signal in the future but I am surprised that the signals aren't received and processed in time in this case. On a related note, I think it would be very nice if there were a way to wait for ANY background job to finish. Currently it seems like one can only wait for either ALL jobs or else a single job with a given PID. Would it be possible to have something like 'wait -' that would block until any of the current background jobs completes? This would make writing simple parallel loops much easier. The busy-wait/SIGUSR solution is kindof a hack and for such a simple problem I would prefer not to depend on gnu parallel. #!/bin/bash nrunning=0 nmax=3 function job_wrap { echo "sleeping: $2 nrunning: $nrunning" eval "$@" kill -s USR2 $$ } trap ': $(( --nrunning ))' USR2 for x in {1..20} do while [[ nrunning -ge nmax ]] do : # busy wait done : $(( ++nrunning )) job_wrap sleep $(( RANDOM % 3 )) & done echo 'start wait' wait trap - USR2 echo 'end wait' $ ./par_sigusr sleeping: 0 nrunning: 1 sleeping: 2 nrunning: 2 sleeping: 0 nrunning: 3 sleeping: 1 nrunning: 3 sleeping: 0 nrunning: 3 sleeping: 2 nrunning: 3 sleeping: 0 nrunning: 3 sleeping: 2 nrunning: 3 sleeping: 2 nrunning: 3 sleeping: 0 nrunning: 3 sleeping: 2 nrunning: 3 sleeping: 2 nrunning: 3 sleeping: 0 nrunning: 3 sleeping: 1 nrunning: 3 sleeping: 2 nrunning: 3 sleeping: 2 nrunning: 3 sleeping: 2 nrunning: 3 sleeping: 1 nrunning: 3 sleeping: 2 nrunning: 3 start wait sleeping: 2 nrunning: 3 end wait $ ./par_sigusr: line 10: kill: (16287) - No such process ./par_sigusr: line 10: kill: (16287) - No such process Thanks! --- Elliott Forney
Re: wait unblocks before signals processed
Of course, this code probably also has a race condition around --nrunning which makes it even less usable. Thanks, --- Elliott ForneyE-Mail: id...@cs.colosetate.edu On Mon, Nov 5, 2012 at 4:33 PM, Elliott Forney wrote: > While trying to modify some code I found on an earlier post for > running N jobs in parallel I came across the interesting behavior > illustrated below. It appears that the wait command proceeds before > my SIGUSR's are all processed. Is this a bug or just a fact of life? > I understand that it isn't possible to know if a process will receive > a signal in the future but I am surprised that the signals aren't > received and processed in time in this case. > > On a related note, I think it would be very nice if there were a way > to wait for ANY background job to finish. Currently it seems like one > can only wait for either ALL jobs or else a single job with a given > PID. Would it be possible to have something like 'wait -' that would > block until any of the current background jobs completes? This would > make writing simple parallel loops much easier. The busy-wait/SIGUSR > solution is kindof a hack and for such a simple problem I would prefer > not to depend on gnu parallel. > > #!/bin/bash > > nrunning=0 > nmax=3 > > function job_wrap > { > echo "sleeping: $2 nrunning: $nrunning" > eval "$@" > kill -s USR2 $$ > } > > trap ': $(( --nrunning ))' USR2 > for x in {1..20} > do > while [[ nrunning -ge nmax ]] > do > : # busy wait > done > > : $(( ++nrunning )) > job_wrap sleep $(( RANDOM % 3 )) & > done > > echo 'start wait' > wait > trap - USR2 > echo 'end wait' > > $ ./par_sigusr > sleeping: 0 nrunning: 1 > sleeping: 2 nrunning: 2 > sleeping: 0 nrunning: 3 > sleeping: 1 nrunning: 3 > sleeping: 0 nrunning: 3 > sleeping: 2 nrunning: 3 > sleeping: 0 nrunning: 3 > sleeping: 2 nrunning: 3 > sleeping: 2 nrunning: 3 > sleeping: 0 nrunning: 3 > sleeping: 2 nrunning: 3 > sleeping: 2 nrunning: 3 > sleeping: 0 nrunning: 3 > sleeping: 1 nrunning: 3 > sleeping: 2 nrunning: 3 > sleeping: 2 nrunning: 3 > sleeping: 2 nrunning: 3 > sleeping: 1 nrunning: 3 > sleeping: 2 nrunning: 3 > start wait > sleeping: 2 nrunning: 3 > end wait > $ ./par_sigusr: line 10: kill: (16287) - No such process > ./par_sigusr: line 10: kill: (16287) - No such process > > Thanks! > --- > Elliott Forney
Re: wait unblocks before signals processed
Hi Elliott. The behavior of wait differs depending upon whether you are in POSIX mode. Try this script, which I think does essentially what you're after (also here: https://gist.github.com/3911059 ): #!/usr/bin/env bash ${BASH_VERSION+shopt -s lastpipe extglob} if [[ -v .sh.version ]]; then builtin getconf function BASHPID.get { read -r .sh.value _ &2 sleep "$2" printf '%d: returning %d\n' "$1" "$3" >&2 return "$3" } function main { typeset -i n= j= maxj=$(getconf _NPROCESSORS_ONLN) set -m trap '((j--))' CHLD while ((n++<30)); do f "$BASHPID" $(((RANDOM%5)+1)) $((RANDOM%2)) & ((++j >= maxj)) && POSIXLY_CORRECT= wait done echo 'finished, waiting for remaining jobs...' >&2 wait } main "$@" echo # vim: set fenc=utf-8 ff=unix ts=4 sts=4 sw=4 ft=sh nowrap et: The remaining issues are making it work in other shells (Bash in non-POSIX mode agrees with ksh, but ksh doesn't agree with POSIX), and also I can't think of a reasonable way to retrieve the exit statuses. The status of "wait" is rather useless here. Otherwise I think this is the best approach, using SIGCHLD and relying upon the POSIX wait behavior. See here: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_11 An issue to be aware of is that the trap will fire when any child exits including command/process substitutions or pipelines etc. If any are located within the main loop then monitor mode needs to be toggled off around them. -- Dan Douglas signature.asc Description: This is a digitally signed message part.
Re: wait unblocks before signals processed
OK, I see in POSIX mode that a trap on SIGCHLD will cause wait to unblock. We are still maintaining a counter of running jobs though so it seems to me that there could race condition in the following line trap '((j--))' CHLD if two processes quit in rapid succession and one trap gets preempted in the middle of ((j--)) then the count may be off. Is this possible? I tried to test whether or not traps are mutually exclusive with the following code and got more interesting warnings. The count appears to suggest that there is indeed a race condition going on here but I am unsure what "bad value in trap_list" means? #!/bin/bash count=0 function dummy { usleep $RANDOM } set -m trap ': $(( ++count ))' CHLD for i in {1..1000} do dummy $i & done wait echo $count $ ./trap_race ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 983 Thanks, --- Elliott Forney On Mon, Nov 5, 2012 at 5:11 PM, Dan Douglas wrote: > Hi Elliott. The behavior of wait differs depending upon whether you are in > POSIX mode. Try this script, which I think does essentially what you're after > (also here: https://gist.github.com/3911059 ): > > #!/usr/bin/env bash > > ${BASH_VERSION+shopt -s lastpipe extglob} > > if [[ -v .sh.version ]]; then > builtin getconf > function BASHPID.get { > read -r .sh.value _ } > fi > > function f { > printf '%d: sleeping %d sec\n' "${@:1:2}" >&2 > sleep "$2" > > printf '%d: returning %d\n' "$1" "$3" >&2 > return "$3" > } > > function main { > typeset -i n= j= maxj=$(getconf _NPROCESSORS_ONLN) > > set -m > trap '((j--))' CHLD > > while ((n++<30)); do > f "$BASHPID" $(((RANDOM%5)+1)) $((RANDOM%2)) & > ((++j >= maxj)) && POSIXLY_CORRECT= wait > done > > echo 'finished, waiting for remaining jobs...' >&2 > wait > } > > main "$@" > echo > > # vim: set fenc=utf-8 ff=unix ts=4 sts=4 sw=4 ft=sh nowrap et: > > > The remaining issues are making it work in other shells (Bash in non-POSIX > mode agrees with ksh, but ksh doesn't agree with POSIX), and also I can't > think of a reasonable way to retrieve the exit statuses. The status of "wait" > is rather useless here. Otherwise I think this is the best approach, using > SIGCHLD and relying upon the POSIX wait behavior. See here: > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_11 > > An issue to be aware of is that the trap will fire when any child exits > including command/process substitutions or pipelines etc. If any are located > within the main loop then monitor mode needs to be toggled off around them. > -- > Dan Douglas
Re: wait unblocks before signals processed
On Monday, November 05, 2012 05:52:41 PM Elliott Forney wrote: > OK, I see in POSIX mode that a trap on SIGCHLD will cause wait to > unblock. We are still maintaining a counter of running jobs though so > it seems to me that there could race condition in the following line > > trap '((j--))' CHLD > > if two processes quit in rapid succession and one trap gets preempted > in the middle of ((j--)) then the count may be off. Is this possible? > I believe that Bash guarantees the trap will run once for every child that exits, so it shoud be impossible for the count to become off. See: https://lists.gnu.org/archive/html/bug-bash/2012-05/msg00055.html I think you might be experiencing other known bugs. Chet pushed several wait/job related commits within the last few weeks. I haven't tested these yet. http://git.savannah.gnu.org/cgit/bash.git/tree/CWRU/CWRU.chlog?h=devel -- Dan Douglas
Re: wait unblocks before signals processed
> I believe that Bash guarantees the trap will run once for every child that > exits, so it shoud be impossible for the count to become off. See: > https://lists.gnu.org/archive/html/bug-bash/2012-05/msg00055.html I guess my question is "can more than one trap run simultaneously?" The more I think about it though, this is probably not possible. It looks like the trap doesn't run in a subprocess and I presume traps are blocked inside of other traps. > I think you might be experiencing other known bugs. Chet pushed several > wait/job related commits within the last few weeks. I haven't tested these > yet. http://git.savannah.gnu.org/cgit/bash.git/tree/CWRU/CWRU.chlog?h=devel Sorry, I should look before posting. I cloned the latest devel branch of bash and now I see the following occasionally but it may still be a work in progress. $ ./trap_race 4.2.37(3)-maint register_alloc: 0x9779a8 already in table as allocated? register_alloc: 0x979378 already in table as allocated? 100 Thanks, --- Elliott Forney