Re: Time limits in queues (was: Re: [Beowulf] VMC - Virtual Machine Console)

Geoff Sun, 20 Jan 2008 10:56:21 -0800

Interesting. We (and by we, I refer to my time at UC Berkeley College ofChemistry) used to implement multiple queues with various timerestrictions to accomdate short, medium, long and extended run jobs. Itwas an honor to system to be sure, but I spent a great amount of timeworking with the researchers on an indvidual level to foster the trustthat an honor system needs. There was also a little logic to allowsubmitted jobs to skew towards one end of the spectrum if the cluster wasnot fully utilized, and not expected to be so. Working that closely withfolks also allowed us to chart cluster usage for about a month (andsometimes much more) so we can tweak cluster policy if appropriate.

It worked out for the most part, but there was the occasional scofflaw.With the trust relationship I had with the researchers, we could usuallynag the scofflaws back into line.

Layer 8 issues can certainly lead to trouble, but it can also be used toyour advantage!

Just a personal observation. I realize this kind of thing would not workeverywhere.


-geoff

PS, sorry for any duplicate copies of this email, I am having some ISPissues this week.





Am 16.01.2008, 18:16 Uhr, schrieb Craig Tierney <[EMAIL PROTECTED]>:

Geoff wrote:
..Interesting discussion deleted..
As a funny aside, I once knew a sysadmin who applied 24 hourtimelimits to all queues of all clusters he managed in order to forceresearchers to think about checkpoints and smart restarts. I couldn'tunderstand why so many folks from his particular unit kept asking meabout arrays inside the scheduler submission scripts and nestedcommends until I found that out. Unfortunately I came to theconclusion that folks in his unit were spending more time writing jobsubmission scripts than code... well... maybe that is an exaggeration.
Our queue limits are 8 hours.  They are set this way for two reasons.
First, we have real time jobs that need to get through the queues and
we believe that allowing significantly longer jobs would block those
really important jobs.  Second, for a multi-user system, it isn't very
fair for a user to run multi-day jobs and prevent shorter jobs fromgetting
in.  It is about being fair.  Use the resource and then get back in line.

I know that at other US Government facilities it is common practice to
set sub-day queue limits. I recently helped setup one site that had
queue limits set at 12 hours.  Another large organization near the top
of the top 500 list does this as well.

This means that codes need check-pointing.  Although we are all waiting
for the holy grail of system level check-pointing, the odds of that being
implemented consistently across architectures AND not have a significant
performance hit is unlikely.  This means that researchers have to also be
software engineers. If they want to get real work done, addingcheck-pointingis one of the steps. As one operations manager at a major HPC site oncesaid
to me 'codes that don't support check-pointing aren't real codes'.

Allowing users to run for days or weeks as SOP is begging for failure.
Did that sysadmin who set 24 hour time limits ever analyze the amount
of lost computational time because of larger time limits?

Craig




--
-------------------------------
Geoff Galitz, [EMAIL PROTECTED]
Blankenheim, Deutschland
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: Time limits in queues (was: Re: [Beowulf] VMC - Virtual Machine Console)

Reply via email to