On Mon, 12 May 2008, Glen Beane wrote:

I know TORQUE USED to be much better than SGE at controlling MPI type jobs.

I think that it still is, due to the long-awaited but still not existing TM support in SGE.

If you use a PBS/TORQUE aware MPI job launcher it is pretty much impossible for any of the job processes to escape control of the batch system.

Hmm, not quite true. I've had just recently several such instances where I had to kill individual processes by hand (using Torque 2.1.10). One nice thing about SGE is its use of setgroups() to set additional groups from a reserved range on the all the processes of a job; as this call is normally only available to "root", it's impossible for user processes to modify the additional groups list and escape being killed; I used SGE in the past and don't remember ever having to clean up processes by hand.

[ Please note that I'm taking here into consideration only the batch system proper and not any kind of prologue/epilogue scripts which are the usual fixes that are applied locally. IMHO job cleanup is a basic functionality that should be included in the batch system proper. ]

Last time I used SGE, I found the MPI support much less sophisticated than TORQUE, but this was several years ago.

This is easy to explain once you have to look at how they both started. However generally speaking I can see that during the past few years they started to grow similar features (f.e. SGE is getting better parallel jobs integration and possibly TM support, Torque is getting job-array support)

--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: [EMAIL PROTECTED]
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to