How does one compare different schedulers, anyways? Is it mostly "word
of mouth" and reputation. Feature sets are good to look at but it's
not really a quantitative metric. Are there any third party
comparisons of various schedulers? Do they have a niche that one
scheduler outperforms another?

I've never seen a useful comparison or even discussion of schedulers.
as far as I can tell, part of the problem is that the conceptual domain
is not developed enough to permit real, general tool-ness.

I don't mean that schedulers aren't useful.  it's dead simple to throw
together a package that lets users submit/control jobs, arbitrate a queue,
matches jobs to resources, fire them up, etc. they all do it, and you can write such a system from scratch in literally a few programmer-days if you know what you're doing. it's the details that matter, and that's where existing schedulers, even though they are functional, are not good tools.

to me, a good tool has a certain patina of conceptual integrity about it. for instance, what it takes to make a good compiler is well known: we are all familiar with the two interfaces: source language and machine code. there are differences in quality, but for the most part, compilers all work alike. we all are at least somewhat aware of the long history of compilers, littered with the kind of mistakes that bring learning.
you know what to expect from a screwdriver or drillpress,
though they may differ in size or power. your programming language may be more torx than phillips, or you may prefer a multibit screwdriver.
but we all know it needs a comfortable handle, a certain size and rigidity,
blade for fitting the screw, etc.

schedulers are more like an insanely ramified swiss army knife: feature
complete is sometimes detrimental, and extreme featurefulness sometimes means
it's guaranteed to not do what you want, only something vaguely in that
direction. I think there's a physics-of-software principle here, that features always lead to less flexibility. (that doesn't deny that techniques
like refactoring help, but they _only_ introduce a discontinuity between
regions of featuritis.  if nothing else, the dragging weight of
back-compatibility is piecewise-montonic...)

Perhaps my quest for a quantitative metric is stupid. Maybe this is
one of the many areas of technology where things are more qualitative
than quantitative anyways. Price/ performance is always hard to define
but for schedulers this seems impossible.

nah.  it's a domain problem: what a scheduler should do and how is simply
not well-defined, so scheduler companies just go for quantity to win you over.

The other issue seems to be per core licensing. To me it seems as an

well, or licensing at all. it would be different if you were paying SchedulerCo to "make everything work the way I want", AND they could actually do it. instead, you pay for the thing they want to sell, and then spend huge amounts of your time fighting it and ultimately erecting shims and scaffolds around it to get it closer to "right".

admin the amount of time and effort one puts in configuring a
scheduler for a 50 core system and a 2000 core system is not grossly
too different (maybe I am wrong?).

depends on what you want. if you're undemanding and merely want to hit the "users can submit jobs which eventually run somehow" milestone,
then there's certainly no reason to pay for anything, and can expect it
to take a competent sysadmin a few hours to set up. ie, the goal is "keep the machineroom warm".

maybe my organization is uniquely picky, but I would say that after 5 years
elapsed, and probably > 2 staff-years manhandling it, our (commercial)
scheduler does maybe 65% of what it should.  we're in the process of
switching to another (mostly commercial) scheduler, and I expect that with
1-2 staff-years of investment, we'll have it close to 65 or 70%.

The license cost on the other hand
scales with cores. That makes the justification even harder.

per-core licensing is just asinine: vendors seem to think that since the government taxes everyone, that's a fine revenue model for them too. they don't consider it from the other direction: that the amount of development and support effort is pretty close to constant per installation (ie, specifically _not_ a function of core count).

we can argue about tax/politics/society over beer sometime,
but "soak the rich" is not optimal in the marketplace.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to