Platform LSF one of the best available offerings if you consider the overall administrative burden, APIs, quality of documentation and quality of support. Basically for some cases the cost of the commercial software license more than pays for itself in having a product that is stable, well documented and has a very low admin/ operational burden. Trying to save money on an open source product and then needing to hire additional people to keep it from falling over is a mistake I've seen at more than one site.

That said, I'm personally a Grid Engine zealot these days and use/ deploy it often. Open source and commercial flavors, amazing support community and a high level of product acceptance & market share in the life sciences which is where I do most of my work.

In the interest of full disclosure I do Grid Engine consulting & training so I'm not totally unbiased here.

When it comes to PBS variants I'd avoid the pure open source versions of PBS/Torque - I don't think I've ever been in an openPBS or Torque shop that has not altered the source code or otherwise dug deeply into the product. For people considering the PBS route I always recommend checking in with the pbspro people first.

Just my $.02

-Chris



On Oct 6, 2009, at 9:22 PM, Rahul Nabar wrote:

Any strong / weak recommendations for / against schedulers? For a long
time we have worked happily with a Torque + Maui system. It isn't
perfect but works (and is free!). But rarely does a chance present
itself to go for something "newer and better" on a in-production
system since people hate changes and outages. This time as we shop for
a new cluster it presents me the opportunity to change if something
better exists.

Any comments? What are other users using out there?  Any horror
stories? Or any super good finds?

I shy against LSF etc since those cost a lot of money.  Especially as
they, and similar systems are mostly licensed per server per year so
the costs do add up. I have been a user on  a LSF systems for a long
time and I think it is an awesome scheduler but have never been at the
admin end of LSF.

One thing that the Torque+Maui option is not the best is that it is
not monolithic. Oftentimes it is hard to know which component to blame
for a problem or more relevant which config file to use to fix a
problem. Torque or Maui. On the other hand , can't get rid of Maui
since Fairshare policies etc. are important to us and those seem to be
in the Maui domain. (all our jobs are MPI jobs in case that is
relevant. We haven't been doing checkpointing yet)

Of course, there is MOAB these days, but I am not sure if that is
worth the money since I have not used it.

I appreciate any comments or words of wisdom you guys might have!

--
Rahul
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to