----- "Rahul Nabar" <rpna...@gmail.com> wrote: > One thing that the Torque+Maui option is not the best is > that it is not monolithic.
Actually from our point of view the really good part of Torque is that the scheduler is pluggable and you can have the very simple pbs_sched, Maui or Moab or even write your own if you want using the examples! > Oftentimes it is hard to know which component to blame > for a problem or more relevant which config file to use > to fix a problem. Torque or Maui. We try and keep Torque *really* simple (just some queues to let a couple of applications select a walltime) and do all the smarts in Maui/Moab. For what we do we have to use Moab, Maui didn't have some of the capabilities we needed. One thing we *really* like is the fact that Torque's pbs_mom can run a health check script and then if that reports an error (say "ERROR /tmp full") then it gets passed back to the pbs_server and Moab will mark the node as down until that error clears. This keeps a node with problems from taking jobs meaning you can get to work on it sooner. Ours checks everything from SMART errors, MCEs, disk space through to if the node needs rebooting for a kernel upgrade. If you're not using Moab then you can instead simply get your health check script to run pbsnodes to mark the node offline (remembering to use the -N message set an appropriate message). cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf