Joe Landman wrote:
a very interesting one. I wonder how many people have scrubbing
turned on in their cluster, and how many use mcelog to monitor the ECC
rate.
We do on clusters we ship/build. I specifically run a tests to flesh
out the memory errors. Sadly, memtest86 only gets the "obvious" errors,
Backing up what Joe says, we very often ask customers to send us mcelog
when they report hardware problems.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf