On 9/3/14, 10:08 AM, crispusfairba...@gmail.com wrote: > $ cat parallel-test.bash > function process_job { > sleep 1 > } > > function main { > typeset -i index=0 cur_jobs=0 max_jobs=6 > trap '((cur_jobs--))' CHLD > set -m > > while ((index++ < 30)); do > echo -n "index: $index, cur_jobs: $cur_jobs" > set +m > childs=$(pgrep -P $$ | wc -w) > (( childs < cur_jobs )) && echo -n ", actual childs: $childs" > echo > set -m > process_job & > ((++cur_jobs >= max_jobs)) && POSIXLY_CORRECT= wait; > done > > echo 'finished, waiting for remaining jobs...' > wait > } > > main > echo "done" > > This works on: > GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu) > > But on: > GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu) > and > GNU bash, version 4.3.24(1)-release (x86_64-unknown-linux-gnu) > > it will around "index: 9" start missing traps (not decrementing cur_jobs):
I figured this out. There are two problems here. The first problem is bash's and will result in bash missing calls to the SIGCHLD trap under certain conditions when running `wait' in posix mode. If bash reaps more than one child in one loop through waitpid() it will only mark as having received one. I have attached a patch to fix this. The second problem is a race condition introduced by your code. Bash will only run the SIGCHLD trap handler for each exiting child when job control is enabled. When you disable job control to run the command substitution, it's possible, and likely, that one of the background jobs will be reaped while the shell is waiting for the command substitution process to complete. Since job control is off, bash won't run the trap for that child. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/
*** ../bash-4.3-patched/jobs.c 2014-05-14 09:20:15.000000000 -0400 --- jobs.c 2014-09-09 11:50:38.000000000 -0400 *************** *** 3340,3344 **** { interrupt_immediately = 0; ! trap_handler (SIGCHLD); /* set pending_traps[SIGCHLD] */ wait_signal_received = SIGCHLD; /* If we're in a signal handler, let CHECK_WAIT_INTR pick it up; --- 3346,3352 ---- { interrupt_immediately = 0; ! /* This was trap_handler (SIGCHLD) but that can lose traps if ! children_exited > 1 */ ! queue_sigchld_trap (children_exited); wait_signal_received = SIGCHLD; /* If we're in a signal handler, let CHECK_WAIT_INTR pick it up;