On Wed, Oct 7, 2009 at 10:22 PM, Rahul Nabar <rpna...@gmail.com> wrote: > How does one compare different schedulers, anyways? Is it mostly "word > of mouth" and reputation. Feature sets are good to look at but it's > not really a quantitative metric.
I would say that, in the spirit of other benchmarks and comparisons discussed on this list, the best way is to try as many different ones as possible and make your own decision based on objective (how much it fits the needs) and subjective (how much you like its interfaces, how comfortable you feel maintaining it) criteria. But of course this is just wishful thinking... most resource managers and schedulers are way too complicated for a simple "let's just install it and run it" scenario, so it takes an excessive amount of time to set them up and to read documentation, pull hair over documented but not working features, contact support or write to mailing lists for help, getting something working only to find out that the users or, even worse, some administrative powers want changes and... start the cycle all over again. One can argue that taking care of a resource manager and scheduler can be a full-time job on a medium/large cluster, somehow similar to a DBA. IMHO, features sets are not a good way of comparing schedulers - much more important is how they map to what you want to achieve and how easy is to interact with them. I can take a lot of time to configure and test (very important part if the admin actually cares about the users ;-)) a medium to complex setup (f.e. several queues, serving several types of nodes with different limits of using the hardware or what users are allowed to do) and often there are several ways of achieving the same result which means that it's quite easy for a beginner to shoot him/her-self in the foot by trying to set too many things at once. The interaction with the scheduler is crucial because usually some results have to be extracted from it (efficiency of usage of the hardware, what ratio of time each user/group has consumed, how well fairshare works, etc.) and sometimes new settings have to be put into effect to change those results; let alone adding new nodes or making changes to accommodate a particular user or software... Another issue is that, although many features are advertised, not all work as you think. At configuration stage and especially while testing (or latest in production...), you can find about all kinds of limitations or implementation details which raise barriers between the resource manager or scheduler and your goals - especially interactions with various other components like MPI libraries (which need to know how to start a process on a remote node, which processors or cores to use, etc.), ISV-provided start-up scripts, existing infrastructure (Kerberos support, username length, number of secondary groups a user can belong to, etc.), but sometimes also inside the resource manager or scheduler itself (how good is it a cleaning up after a failed job, what's the maximum number of queues, etc.). I agree with Mark Hahn about selling a solution that solves a particular site's resource management and scheduling problem and not a generic solution that has to be customized by people who might have a hard time understanding the concepts and maybe limitations of the offered solution. Of course, some academic sites might not want this... to reduce costs, to not depend on a particular vendor, to allow for new ideas to be implemented, while many companies which require resource management and scheduling probably have their staff trained for this particular task. But I see a real need in small academic groups or small companies which don't have enough manpower to dedicate to this... Sorry for the rather negative message, but one would expect that after more than 20 years such an important piece of middleware would reach maturity and be easy to deploy and configure. Cheers, Bogdan _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf