Jon Forrest <jlforr...@berkeley.edu> wrote:

> I have a rack full of identical compute
> nodes. One of them has become heat sensitive.
> 
> When it's in the warm computer room it crashes.
> I can't even run memtest from the CentOS DVD
> for 2 seconds. However, when this node is
> in my much cooler office everything works
> fine. All the other nodes are working fine
> in the computer room.

Presumably you have already blown the dust out of it and reseated all
the obvious suspect components.

If the motherboard has a "shutdown on overheat" option that may now have
a value set low enough that it stops the machine in the warmer room.  If
you didn't explicitly set it to that value then suspect the motherboard
battery - change it, reset the BIOS, and all should be well.  If the 
machine has a hardware status monitor in the BIOS check there too for
out of range temperatures.

(Odd that your machine room is much hotter than your office.)

Regards,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to