Bogdan Costescu wrote:
On Wed, 16 Jan 2008, Craig Tierney wrote:
Our queue limits are 8 hours.
...
Did that sysadmin who set 24 hour time limits ever analyze the amount
of lost computational time because of larger time limits?
While I agree with the idea and reasons of short job runtime limits, I
disagree with your formulation. Being many times involved in discussions
about what runtime limits should be set, I wouldn't make myself a
statement like yours; I would say instead: YMMV. In other words: choose
what fits better the job mix that users are actually running. If you
have determined that 8h max. runtime is appropriate for _your_ cluster
and increasing it to 24h would lead to a waste of computational time due
to the reliability of _your_ cluster, then you've done your job well.
But saying that everybody should use this limit is wrong.
First all I agree that it is always a YMMV case. We good about that here (the
list).
My point was, that in every instance that I have seen, multi-day queue
limits are not the norm. Those places do have exceptions for particular codes
and particular projects. I know our system would handle 24h queues in terms
of reliability, but with the job mix we have, it would cause problems beyond
stability
(we are currently looking at a new scheduler to solve that problem).
Furthermore, although you mention that system-level checkpointing is
associated with a performance hit, you seem to think that user-level
checkpointing is a lot lighter, which is most often not the case.
There was an assumption in my statement that I didn't share with people.
I was thinking about system-level checkpointing that will probably work
for clusters which will be some sort of VM based solution. That will
have the overhead of the virtual machine as well as moving the data when
the time comes.
Apart
from the obvious I/O limitations that could restrict saving & loading of
checkpointing data, there are applications for which developers have
chosen to not store certain data but recompute it every time it is
needed because the effort of saving, storing & loading it is higher than
the computational effort of recreating it - but this most likely means
that for each restart of the application this data has to be recomputed.
Yes, but didn't you just say the recomputing that data are faster than the
IO time associated with reading it? A checkpoint isn't model results. A
checkpoint
is a state of the model at a particular time, so in this case you would save
that data. Its already in memory, you just need to write it out with every
other bit of relevant information. No extra needed computations.
And smaller max. runtimes mean more restarts needed to reach the same
total runtime...
Yes, anytime you are doing something other than the model run (like
checkpointing)
your run will take longer. This is another one of those "it depends" scenario.
If the runtime takes 1% longer, and it makes the other users happier or lessens
the loss due to an eventual crash, is it worth it?
The 1% number is a target I would design for, based on the workload we
experience
(multitude of different sized jobs, not one big job). I would buy a couple of
nodes with 3ware
cards and run either Lustre or PVFS2 over it for a place to dump the
checkpoints. The
filesystem would be mostly volatile (so redundancy wouldn't be critical), and
would
more than meet the reliability requirements of my system (>97%).
Craig
--
Craig Tierney ([EMAIL PROTECTED])
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf