----- "Francesco Pietra" <[email protected]> wrote:
> Therefore, is it any software way to check if the CPUs are fully in > order, including the memory controller? lshw and other software > provided only partial help in my hands. Make sure that you have ECC turned to MAX in your BIOS, on our SuperMicro mainboards that enables scrubs of RAM and CPU caches as well as spotting ECC memory errors. For some reason the SuperMicro BIOS's we've had recently have defaulted to turning ECC off which isn't particularly useful, especially on motherboards that can only take ECC memory! We found that the hard way recently, and you can work that out from the output of dmidecode like this: dmidecode | grep -A7 "Physical Memory Array" | grep "Error Correction"| grep ECC Make sure you're also running mcelog to pull any MCE or ECC hardware reports that the kernel has recorded from the CPUs out to a logfile. We find that running it with the --k8 and --dmi options is important to decode more information about these events. cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
