Parallelism a la make -j / GNU parallel

2012-05-03 Thread Colin McEwan
Hi there,

I don't know if this is anything that has ever been discussed or
considered, but would be interested in any thoughts.

I frequently find myself these days writing shell scripts, to run on
multi-core machines, which could easily exploit lots of parallelism (eg. a
batch of a hundred independent simulations).

The basic parallelism construct of '&' for async execution is highly
expressive, but it's not useful for this sort of use-case: starting up 100
jobs at once will leave them competing, and lead to excessive context
switching and paging.

So for practical purposes, I find myself reaching for 'make -j' or GNU
parallel, both of which destroy the expressiveness of the shell script as I
have to redirect commands and parameters to Makefiles or stdout, and
wrestle with appropriate levels of quoting.

What I would really *like* would be an extension to the shell which
implements the same sort of parallelism-limiting / 'process pooling' found
in make or 'parallel' via an operator in the shell language, similar to '&'
which has semantics of *possibly* continuing asynchronously (like '&') if
system resources allow, or waiting for the process to complete (';').

Any thoughts, anyone?

Thanks!

-- 
C.

https://plus.google.com/109211294311109803299
https://www.facebook.com/mcewanca


Re: Parallelism a la make -j / GNU parallel

2012-05-03 Thread Colin McEwan
Indeed, I've used variations of most of these in the past. :)

My contention is that this is the sort of thing that more people will want to 
do more frequently, and that this is a reasonable argument in favour of 
including the functionality *correctly* in the core language for maximum 
expressiveness without external dependencies.

I just don't know if that fits with the maintenance/extension philosophy 
applied to bash ;)

-- 
iC.

On 3 May 2012, at 20:21, Elliott Forney  wrote:

> Here is a construct that I use sometimes... although you might wind up
> waiting for the slowest job in each iteration of the loop:
> 
> 
> maxiter=100
> ncore=8
> 
> for iter in $(seq 1 $maxiter)
> do
>  startjob $iter &
> 
>  if (( (iter % $ncore) == 0 ))
>  then
>wait
>  fi
> done
> 
> 
> On Thu, May 3, 2012 at 12:49 PM, Colin McEwan  wrote:
>> Hi there,
>> 
>> I don't know if this is anything that has ever been discussed or
>> considered, but would be interested in any thoughts.
>> 
>> I frequently find myself these days writing shell scripts, to run on
>> multi-core machines, which could easily exploit lots of parallelism (eg. a
>> batch of a hundred independent simulations).
>> 
>> The basic parallelism construct of '&' for async execution is highly
>> expressive, but it's not useful for this sort of use-case: starting up 100
>> jobs at once will leave them competing, and lead to excessive context
>> switching and paging.
>> 
>> So for practical purposes, I find myself reaching for 'make -j' or GNU
>> parallel, both of which destroy the expressiveness of the shell script as I
>> have to redirect commands and parameters to Makefiles or stdout, and
>> wrestle with appropriate levels of quoting.
>> 
>> What I would really *like* would be an extension to the shell which
>> implements the same sort of parallelism-limiting / 'process pooling' found
>> in make or 'parallel' via an operator in the shell language, similar to '&'
>> which has semantics of *possibly* continuing asynchronously (like '&') if
>> system resources allow, or waiting for the process to complete (';').
>> 
>> Any thoughts, anyone?
>> 
>> Thanks!
>> 
>> --
>> C.
>> 
>> https://plus.google.com/109211294311109803299
>> https://www.facebook.com/mcewanca