On 04/03/16 06:40, Douglas Eadline wrote:

> Yes, failure needs to be option.

The Slurm folks have been working on failure management support for a
little while, the idea being you can have a pool of spare nodes to pick
from (or alternatively bargain with a scheduler for a node that's
currently busy to come free later on and then add it to the job,
potentially extending the walltime to make up for the shortfall).

A better description from someone with higher caffeination is here:

http://slurm.schedmd.com/nonstop.html

All the best,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to