This mail was spurred on by users in the #bash IRC channel. It started after reading <http://mywiki.wooledge.org/ProcessManagement> where the article introduces an example using 'wait -n' as a means to provide capped parallism:
#!/usr/bin/env bash # number of processes to run in parallel num_procs=5 # function that processes one item my_job() { printf 'Processing %s\n' "$1" sleep "$(( RANDOM % 5 + 1 ))" } i=0 while IFS= read -r line; do if (( i++ >= num_procs )); then wait -n # wait for any job to complete. New in 4.3 fi my_job "$line" & done < inputlist wait # wait for the remaining processes The question is about how the example works in order to maintain parallelism capped at num_proc. Below I've provided a synthetic scenario which hopefully highlights my (and others) confusion. The logic is to provide two loops, one generating an initially slow feed of "work" for the second loop which starts "agents" in the background. Then the iteration 'i' is compared against 'nproc' (for which I use 3) to guard calls to 'wait -n' once 'i' equals or exceeds 'nproc'. As the initial feed rate and the backgrounded agents both initially take 2 seconds, there is only ever one agent started at a time, one after the other. A typical process tree in top or htop might look something like this: bash scriptname |- bash scriptname (while read) | `- bash scriptname (agent) | `- sleep 2 `- bash scriptname (slowthenfast) `- sleep 2 After some time the value of 'i' will have incremented well beyond the value of 'nproc'. It is now that the feed rate speeds up dramatically, providing more work for the agents. Due to this more agents are started while still maintaining the nproc limit: bash scriptname |- bash scriptname | |- bash scriptname | | `- sleep 2 | |- bash scriptname | | `- sleep 2 | `- bash scriptname | `- sleep 2 `- bash scriptname `- sleep 0.1 And I have no idea why or how this works. I hope the list can help explain this behaviour. --- My intuition, or assumption is as follows: I would expect that the if statement in the second loop would always succeed. It would then call 'wait -n' and wait for the existing agent to end (as I assume it's the only job running at this point). Once it ends a new agent will be started and back to the 'wait -n' the loop will go. In effect it should keep starting only one agent after the other. E.g.: agent0 (this is the last agent that ran before the loop speed increased) while read (i++ >= nproc) => always true wait -n => waits for agent0 (as its the only job?) agent0 ends agent1 starts while read (i++ >= nproc) => always true wait -n => waits for agent1 (as its the only job?) agent1 ends agent2 starts while read (i++ >= nproc) => always true wait -n => waits for agent2 (as its the only job?) agent2 ends agent3 starts But what appears to be happening is this: agent0 (this is the last agent that ran before the loop speed increased) while read (i++ >= nproc) => always true wait -n => waits for agent0 (as its the only job?) agent0 ends agent1 starts agent2 starts agent3 starts --- #!/bin/bash nproc=3 agent() { printf 'agent: %d: started... (i is %d)\n' "$1" "$2" sleep 2 printf 'agent: %d: finished\n' "$1" } slowthenfast() { local a=0 while :; do printf '%d\n' "$a" if (( a >= 10 )); then sleep 0.1 else sleep 2 fi (( ++a )) done } i=0 slowthenfast | while read -r work; do if (( i++ >= nproc )); then wait -n fi agent "$work" "$i" & done wait