Jon Forrest <jlforr...@berkeley.edu> wrote: > I have a rack full of identical compute > nodes. One of them has become heat sensitive. > > When it's in the warm computer room it crashes. > I can't even run memtest from the CentOS DVD > for 2 seconds. However, when this node is > in my much cooler office everything works > fine. All the other nodes are working fine > in the computer room.
Presumably you have already blown the dust out of it and reseated all the obvious suspect components. If the motherboard has a "shutdown on overheat" option that may now have a value set low enough that it stops the machine in the warmer room. If you didn't explicitly set it to that value then suspect the motherboard battery - change it, reset the BIOS, and all should be well. If the machine has a hardware status monitor in the BIOS check there too for out of range temperatures. (Odd that your machine room is much hotter than your office.) Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf