Craig Tierney <[EMAIL PROTECTED]> writes:

> Allowing users to run for days or weeks as SOP is begging for failure.

Define failure. Our time limit is typically somewhere around 5 or 6
days. Many codes don't have checkpointing, and it's often simply not
possible to add it because you don't have access to the source code.

With backfill scheduling, short and narrow jobs typically don't have
to wait *that* long, at least with the job mixture we see.

-- 
Leif Nixon                       -            Systems expert
------------------------------------------------------------
National Supercomputer Centre    -      Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to