Re: [Beowulf] non-stop computing

Christopher Samuel Tue, 25 Oct 2016 21:08:07 -0700

On 26/10/16 14:45, John Hanks wrote:

> I'd suggest making NFS mounts hard, so processes can recover from an NFS
> server reboot.


...plus set the NFS fsid for each export server side so they come back
reproducibly each time...

PS: I endorse what John said (now I've finished laughing), I'd suggest
making sure you've at least got ECC memory though and RAID as those are
the two parts that can go bad.  When we had clusters with disks in
compute nodes those were the most frequent failures, now we run diskless
nodes it's memory DIMMs. :-)

All the best,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] non-stop computing

Reply via email to