Re: [Beowulf] scheduler recommendations for a HPC cluster

Mark Hahn Wed, 07 Oct 2009 17:01:07 -0700

How does one compare different schedulers, anyways? Is it mostly "word
of mouth" and reputation. Feature sets are good to look at but it's
not really a quantitative metric. Are there any third party
comparisons of various schedulers? Do they have a niche that one
scheduler outperforms another?


I've never seen a useful comparison or even discussion of schedulers.
as far as I can tell, part of the problem is that the conceptual domain
is not developed enough to permit real, general tool-ness.

I don't mean that schedulers aren't useful.  it's dead simple to throw
together a package that lets users submit/control jobs, arbitrate a queue,

matches jobs to resources, fire them up, etc. they all do it, and youcan write such a system from scratch in literally a few programmer-daysif you know what you're doing. it's the details that matter, and that'swhere existing schedulers, even though they are functional, are notgood tools.

to me, a good tool has a certain patina of conceptual integrity about it.for instance, what it takes to make a good compiler is well known: weare all familiar with the two interfaces: source language and machine code.there are differences in quality, but for the most part, compilers allwork alike. we all are at least somewhat aware of the long history ofcompilers, littered with the kind of mistakes that bring learning.

you know what to expect from a screwdriver or drillpress,

though they may differ in size or power. your programming language maybe more torx than phillips, or you may prefer a multibit screwdriver.

but we all know it needs a comfortable handle, a certain size and rigidity,
blade for fitting the screw, etc.

schedulers are more like an insanely ramified swiss army knife: feature
complete is sometimes detrimental, and extreme featurefulness sometimes means
it's guaranteed to not do what you want, only something vaguely in that

direction. I think there's a physics-of-software principle here, thatfeatures always lead to less flexibility. (that doesn't deny that techniques

like refactoring help, but they _only_ introduce a discontinuity between
regions of featuritis.  if nothing else, the dragging weight of
back-compatibility is piecewise-montonic...)

Perhaps my quest for a quantitative metric is stupid. Maybe this is
one of the many areas of technology where things are more qualitative
than quantitative anyways. Price/ performance is always hard to define
but for schedulers this seems impossible.


nah.  it's a domain problem: what a scheduler should do and how is simply
not well-defined, so scheduler companies just go for quantity to win you over.

The other issue seems to be per core licensing. To me it seems as an

well, or licensing at all. it would be different if you were payingSchedulerCo to "make everything work the way I want", AND they couldactually do it. instead, you pay for the thing they want to sell,and then spend huge amounts of your time fighting it and ultimatelyerecting shims and scaffolds around it to get it closer to "right".

admin the amount of time and effort one puts in configuring a
scheduler for a 50 core system and a 2000 core system is not grossly
too different (maybe I am wrong?).

depends on what you want. if you're undemanding and merely want tohit the "users can submit jobs which eventually run somehow" milestone,

then there's certainly no reason to pay for anything, and can expect it

to take a competent sysadmin a few hours to set up. ie, the goal is"keep the machineroom warm".


maybe my organization is uniquely picky, but I would say that after 5 years
elapsed, and probably > 2 staff-years manhandling it, our (commercial)
scheduler does maybe 65% of what it should.  we're in the process of
switching to another (mostly commercial) scheduler, and I expect that with
1-2 staff-years of investment, we'll have it close to 65 or 70%.

The license cost on the other hand
scales with cores. That makes the justification even harder.

per-core licensing is just asinine: vendors seem to think that sincethe government taxes everyone, that's a fine revenue model for them too.they don't consider it from the other direction: that the amount ofdevelopment and support effort is pretty close to constant per installation(ie, specifically _not_ a function of core count).


we can argue about tax/politics/society over beer sometime,
but "soak the rich" is not optimal in the marketplace.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] scheduler recommendations for a HPC cluster

Reply via email to