On 9/3/14, 10:08 AM, crispusfairba...@gmail.com wrote: > $ cat parallel-test.bash > function process_job { > sleep 1 > } > > function main { > typeset -i index=0 cur_jobs=0 max_jobs=6 > trap '((cur_jobs--))' CHLD > set -m > > while ((index++ < 30)); do > echo -n "index: $index, cur_jobs: $cur_jobs" > set +m > childs=$(pgrep -P $$ | wc -w) > (( childs < cur_jobs )) && echo -n ", actual childs: $childs" > echo > set -m > process_job & > ((++cur_jobs >= max_jobs)) && POSIXLY_CORRECT= wait; > done > > echo 'finished, waiting for remaining jobs...' > wait > } > > main > echo "done" > > This works on: > GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu) > > But on: > GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu) > and > GNU bash, version 4.3.24(1)-release (x86_64-unknown-linux-gnu) > > it will around "index: 9" start missing traps (not decrementing cur_jobs): > > $ bash-4.3.24/bin/bash parallel-test.bash > index: 1, cur_jobs: 0 > index: 2, cur_jobs: 1 > index: 3, cur_jobs: 2 > index: 4, cur_jobs: 3 > index: 5, cur_jobs: 4 > index: 6, cur_jobs: 5 > index: 7, cur_jobs: 5 > index: 8, cur_jobs: 5 > index: 9, cur_jobs: 5, actual childs: 4 > index: 10, cur_jobs: 5, actual childs: 3 > index: 11, cur_jobs: 5, actual childs: 3 > ... > > > If the sleep is changed to be random, it might work correctly for the whole > 30 iterations, which points to a race condition somewhere?
The problem is running the wait builtin in posixly-correct mode. That causes the first SIGCHLD to interrupt wait (as Posix requires) and results in timing issues. I will look at making SIGCHLD traps more reliable in the face of the Posix requirements. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/