Hi David, Apologies for the personal copy but emails to the list from my new address are being moderated and I suspect the moderator is away at present..
On Tue, 26 Jan 2010 05:46:31 am David Mathog wrote: > Is it possible to have the Machine Check Exception (MCE) information > saved to disk automatically on the next warm boot? Depending on your kernel version it may well do that by default, for instance both 2.6.20 and 2.6.28 (to pick at random from git) say: /* Log the machine checks left over from the previous reset. This also clears all registers */ do_machine_check(NULL, mce_bootlog ? -1 : -2); Greg mentions mcelog, well that will write output to a file but if that data doesn't make it to spinning rust before the machine locks up then you're out of luck as it'll have cleared the MCE log as part of its action. :-( There is parsemce by Dave Jones [1], apparently you can parse through some of the parameters you get - for instance for your error I get: $ ./parsemce -e 0000000000000007 -b 2 -a 00000000001511C0 -s 940040000000017A Status: (7) Machine Check in progress. Error IP valid Restart IP valid. parsebank(2): 940040000000017a @ 1511c0 External tag parity error Correctable ECC error Address in addr register valid Error enabled in control register Memory heirarchy error Request: Generic error Transaction type : Generic Memory/IO : I/O IIRC that means that you took a machine check whilst there was already a MCE happening, and that becomes an uncorrectable error and the box will die. [1] - http://www.codemonkey.org.uk/projects/parsemce/parsemce.c If you can upgrade to a current kernel (2.6.3x) you can enable the new EDAC code which will decode MCEs in the kernel and process/log them there which might yield better information for you (and might even make it to a remote syslog if they don't make it to the local platters). Best of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf