How does this wait -n work to cap parallelism?

Earnestly Mon, 29 Jul 2019 11:26:50 -0700

This mail was spurred on by users in the #bash IRC channel.  It started
after reading <http://mywiki.wooledge.org/ProcessManagement> where the
article introduces an example using 'wait -n' as a means to provide capped
parallism:



        #!/usr/bin/env bash

        # number of processes to run in parallel
        num_procs=5

        # function that processes one item
        my_job() {
            printf 'Processing %s\n' "$1"
            sleep "$(( RANDOM % 5 + 1 ))"
        }

        i=0
        while IFS= read -r line; do
            if (( i++ >= num_procs )); then
                wait -n   # wait for any job to complete. New in 4.3
            fi
            my_job "$line" &
        done < inputlist
        wait # wait for the remaining processes


The question is about how the example works in order to maintain
parallelism capped at num_proc.

Below I've provided a synthetic scenario which hopefully highlights my
(and others) confusion.

The logic is to provide two loops, one generating an initially slow feed of
"work" for the second loop which starts "agents" in the background.
Then the iteration 'i' is compared against 'nproc' (for which I use 3)
to guard calls to 'wait -n' once 'i' equals or exceeds 'nproc'.

As the initial feed rate and the backgrounded agents both initially take 2
seconds, there is only ever one agent started at a time, one after the other.

A typical process tree in top or htop might look something like this:


        bash scriptname
        |- bash scriptname (while read)
        |  `- bash scriptname (agent)
        |     `- sleep 2
        `- bash scriptname (slowthenfast)
         `- sleep 2


After some time the value of 'i' will have incremented well beyond the
value of 'nproc'.  It is now that the feed rate speeds up dramatically,
providing more work for the agents.

Due to this more agents are started while still maintaining the nproc limit:


        bash scriptname
        |- bash scriptname
        |  |- bash scriptname
        |  |  `- sleep 2
        |  |- bash scriptname
        |  |  `- sleep 2
        |  `- bash scriptname
        |     `- sleep 2
        `- bash scriptname
         `- sleep 0.1


And I have no idea why or how this works.  I hope the list can help
explain this behaviour.

---

My intuition, or assumption is as follows:

I would expect that the if statement in the second loop would always
succeed.  It would then call 'wait -n' and wait for the existing agent
to end (as I assume it's the only job running at this point).  Once it
ends a new agent will be started and back to the 'wait -n' the loop will
go.

In effect it should keep starting only one agent after the other.  E.g.:


        agent0 (this is the last agent that ran before the loop speed increased)

    while read
        (i++ >= nproc) => always true
            wait -n => waits for agent0 (as its the only job?)
            agent0 ends

        agent1 starts

    while read
        (i++ >= nproc) => always true
            wait -n => waits for agent1 (as its the only job?)
            agent1 ends

        agent2 starts

    while read
        (i++ >= nproc) => always true
            wait -n => waits for agent2 (as its the only job?)
            agent2 ends

        agent3 starts


But what appears to be happening is this:


        agent0 (this is the last agent that ran before the loop speed increased)

    while read
        (i++ >= nproc) => always true
            wait -n => waits for agent0 (as its the only job?)
            agent0 ends

        agent1 starts
        agent2 starts
        agent3 starts


---

#!/bin/bash

nproc=3

agent() {
    printf 'agent: %d: started... (i is %d)\n' "$1" "$2"
    sleep 2
    printf 'agent: %d: finished\n' "$1"
}

slowthenfast() {
    local a=0

    while :; do
        printf '%d\n' "$a"

        if (( a >= 10 )); then
            sleep 0.1
        else
            sleep 2
        fi

        (( ++a ))
    done
}

i=0
slowthenfast | while read -r work; do
    if (( i++ >= nproc )); then
        wait -n
    fi

    agent "$work" "$i" &
done

wait

How does this wait -n work to cap parallelism?

Reply via email to