On Mon, Mar 30, 2009 at 01:11:20AM -0400, Mark Hahn wrote: >> /Could those of you running ECC memory give me an updated figure on the >> number of errors detected/corrected per day per system? / > > we replace dimms which show > 1000 corrected ECCs per day > (or any overflows, for which counts are inaccurate, or any uncorrectable > errors.)
These systems are a couple of generations old, right? I think I have Linux set up to record single-bit errors, and the rate I get is basically zero oh, uh, 5 terabytes of modern ram, at sea level. When I installed some new memory I had a few systems with modest numbers of single-bit upsets, and the vendor was happy to swap dimms until the problem went away. I think he also does that during his factory burn-in. -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf