Is it possible to have the Machine Check Exception (MCE) information saved to disk automatically on the next warm boot?
Long form: A K7 node crashed yesterday and left an MCE on the screen which I copied down as: CPU 0 machine check exception 0000000000000007 Bank 1 F000000000000853 Bank 2 940040000000017A at 00000000001511C0 Kernel panic, not syncing, Unable to Continue Copying all of those numbers down is very error prone. As I understand it the MCE values stay in the registers of the CPU after the crash, and may be retrieved at the next warm boot (via a front panel reset, for instance). But this save seems not to happen automatically, or at least I could not find anything that looked like an MCE dump in /var/log or /var/log/kernel when the system came up. So I want to set things up, if possible to save this information to disk. For what its worth, this is on a Tyan S2466, and while on the next warm boot the hardware monitor in the BIOS showed the CPU fan at full speed, when the OS came up lm_sensors showed it at half speed. I have seen this glitch before on other mysterious crashes, and the only way to clear it seems to be to unplug the unit for 10 minutes, allowing time for the errant bit fade away. This is on a 2.6.24.17 kernel. Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf