On Thu, Jul 26, 2007 at 08:48:35AM -0700, David Mathog wrote: > "Nathan Moore" <[EMAIL PROTECTED]> wrote > > > Earlier this summer, the case fan on one of the machines failed, and the > > result seems like a cooked motherboard (erratic errors with the integrated > > NIC). > > There should be an automatic shutdown script running to detect > temperature events and shut down the machine before it is damaged. > This is what I use on some machines: > > ftp://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/sensor_monitor.tar.gz
Depending on the board and kernel, ACPI will also provide these services. On an FC4 (2.6.14) system, I had to do the following to get that to work: echo 90 > /proc/acpi/thermal_zone/THRM/polling_frequency echo 80:0:70:65:0 > /proc/acpi/thermal_zone/THRM/trip_points The first echo caused the auto shutdown to work; the second set the values I wanted, i.e., shutdown at 80C. Some ACPI cognescenti said the fact that I had to "manually enable" the polling/shutdown was an error in that version of the kernel. I discovered all this when I came home to that sickening overly-hot electronics smell, a case *very* hot to the touch, and the CPU at 104C due to a dead CPU fan. Happily, it took a licking and kept on ticking. -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf