Re: [Beowulf] non-stop computing

Justin Y. Shi Wed, 26 Oct 2016 06:13:09 -0700

John's post is really funny! But I would only endorse Gavin's
recommendation for it solves the problem statistically (and correctly).


Justin

On Wed, Oct 26, 2016 at 12:07 AM, Christopher Samuel <[email protected]>
wrote:

> On 26/10/16 14:45, John Hanks wrote:
>
> > I'd suggest making NFS mounts hard, so processes can recover from an NFS
> > server reboot.
>
> ...plus set the NFS fsid for each export server side so they come back
> reproducibly each time...
>
> PS: I endorse what John said (now I've finished laughing), I'd suggest
> making sure you've at least got ECC memory though and RAID as those are
> the two parts that can go bad.  When we had clusters with disks in
> compute nodes those were the most frequent failures, now we run diskless
> nodes it's memory DIMMs. :-)
>
> All the best,
> Chris
> --
>  Christopher Samuel        Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: [email protected] Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/      http://twitter.com/vlsci
> _______________________________________________
> Beowulf mailing list, [email protected] sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] non-stop computing

Reply via email to