Am 20.08.2013 um 20:03 schrieb Mark Hahn <h...@mcmaster.ca>:

>> 1. Many people's job scripts use ssh either directly (to say clean up /tmp)
>> or indirectly from mpirun.
> 
> sure.

Well, GridEngine could catch the call to rsh/ssh (unless it uses an absolute 
path) and route it to GridEngine's `qrsh --inherit ...`, so they run under 
SGE's control and accounting. Also the creation and removal of the local 
scratch directory is done automatically. In case that more than one scratch 
directory is necessary for the job, a prolog/epilog can provide and remove it.

Especially: if one deletes a job, you need proper trapping in the job script as 
otherwise the custom coded removal of the scratch direcory can't take place any 
longer - the job script might have been killed already. This could fill up the 
local disk over time.


>> (good mpirun's use the batch engine's per-node
>> daemon to launch the binaries not ssh).
> 
> why should a scheduler have daemons cluttering up compute nodes?

I think he refers to a daemon like sge_execd in SGE to receive jobs from the 
qmaster, and there will be similar ones for other queuing systems running on 
each node I would expect.

-- Reuti


> also, do you really launch so many very big but very short jobs 
> that startup time is a serious concern?  I'm questioning assumptions here.
> 
> thanks, mark.
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to