Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash

Zac Medico Sat, 02 Jun 2012 23:53:20 -0700

On 06/02/2012 10:05 PM, Mike Frysinger wrote:
> On Saturday 02 June 2012 19:59:02 Brian Harring wrote:
>> On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote:
>>> # @FUNCTION: multijob_post_fork
>>> # @DESCRIPTION:
>>> # You must call this in the parent process after forking a child process.
>>> # If the parallel limit has been hit, it will wait for one to finish and
>>> # return the child's exit status.
>>> multijob_post_fork() {
>>>
>>>     [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
>>>     
>>>     : $(( ++mj_num_jobs ))
>>>     
>>>     if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then
>>>     
>>>             multijob_finish_one
>>>     
>>>     fi
>>>     return $?
>>>
>>> }
>>
>> Minor note; the design of this (fork then check), means when a job
>> finishes, we'll not be ready with more work.  This implicitly means
>> that given a fast job identification step (main thread), and a slower
>> job execution (what's backgrounded), we'll not breach #core of
>> parallelism, nor will we achieve that level either (meaning
>> potentially some idle cycles left on the floor).
>>
>> Realistically, the main thread (what invokes post_fork) is *likely*,
>> (if the consumer isn't fricking retarded) to be doing minor work-
>> mostly just poking about figuring out what the next task/arguments
>> are to submit to the pool.  That work isn't likely to be a full core
>> worth of work, else as I said, the consumer is being a retard.
>>
>> The original form of this was designed around the assumption that the
>> main thread was light, and the backgrounded jobs weren't, thus it
>> basically did the equivalent of make -j<cores>+1, allowing #cores
>> background jobs running, while allowing the main thread to continue on
>> and get the next job ready, once it had that ready, it would block
>> waiting for a slot to open, then immediately submit the job once it
>> had done a reclaim.
> 
> the original code i designed this around had a heavier main thread because it 
> had series of parallel sections followed by serial followed by parallel where 
> the serial regions didn't depend on the parallel finishing right away.  that 
> and doing things post meant it was easier to pass up return values because i 
> didn't have to save $? anywhere ;).
> 
> thinking a bit more, i don't think the two methods are mutually exclusive.  
> it's easy to have the code support both, but i'm not sure the extended 
> documentation helps.


Can't you just add a multijob_pre_fork function and do your waiting in
there instead of in the multijob_post_fork function?
-- 
Thanks,
Zac

Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash

Reply via email to