On 26/10/16 14:45, John Hanks wrote: > I'd suggest making NFS mounts hard, so processes can recover from an NFS > server reboot.
...plus set the NFS fsid for each export server side so they come back reproducibly each time... PS: I endorse what John said (now I've finished laughing), I'd suggest making sure you've at least got ECC memory though and RAID as those are the two parts that can go bad. When we had clusters with disks in compute nodes those were the most frequent failures, now we run diskless nodes it's memory DIMMs. :-) All the best, Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf