Joe Landman wrote:

a very interesting one. I wonder how many people have scrubbing turned on in their cluster, and how many use mcelog to monitor the ECC rate.

We do on clusters we ship/build. I specifically run a tests to flesh out the memory errors. Sadly, memtest86 only gets the "obvious" errors,

Backing up what Joe says, we very often ask customers to send us mcelog when they report hardware problems.

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to