On Fri, Oct 23, 2009 at 01:01:05PM -0500, Rahul Nabar wrote: > 2. Some errors are hardware precipitated. Aging, out-of-warranty > aging, hardware can sometimes need such a reboot compromise for > one-off random errors. > > Maybe all the "nice" clusters out there never have this issue but for > me it is fairly common. Just confessing.
Why, exactly, are you assuming that your freezes are one-off random errors due to aging hardware? Sounds like you're either guessing, or you _are_ doing forensics, but aren't calling it forensics. -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf