On Tue, Mar 31, 2009 at 12:14:06AM -0400, Mark Hahn wrote:
> using edac? I toyed with mcelog before that, but never really got much
> traction until edac came with an updated kernel.
Yes, EDAC.
-- greg
___
Beowulf mailing list, Beowulf@beowulf.org
we replace dimms which show > 1000 corrected ECCs per day
(or any overflows, for which counts are inaccurate, or any uncorrectable
errors.)
These systems are a couple of generations old, right?
waaait a minute - I think I gave the wrong impression. we have about
13 TB of this gen hardware (ye
/Could those of you running ECC memory give me an updated figure on
the number of errors detected/corrected per day per system? /
we replace dimms which show > 1000 corrected ECCs per day (or
any overflows, for which counts are inaccurate, or any
uncorrectable errors.)
That seems a remarkably
On Mon, Mar 30, 2009 at 01:11:20AM -0400, Mark Hahn wrote:
>> /Could those of you running ECC memory give me an updated figure on the
>> number of errors detected/corrected per day per system? /
>
> we replace dimms which show > 1000 corrected ECCs per day
> (or any overflows, for which counts are
> -Original Message-
> From: beowulf-boun...@beowulf.org
> [mailto:beowulf-boun...@beowulf.org] On Behalf Of Mark Hahn
> Sent: Sunday, March 29, 2009 10:11 PM
> To: ariel sabiguero yawelak
> Cc: Beowulf@beowulf.org
> Subject: Re: [Beowulf] Memory errors poll
>
/Could those of you running ECC memory give me an updated figure on the
number of errors detected/corrected per day per system? /
we replace dimms which show > 1000 corrected ECCs per day
(or any overflows, for which counts are inaccurate, or any
uncorrectable errors.)
I have an old figure o
Hi all.
This is not a direct HPC question per-se, but your clusters are an
excellent source for the information I need, so here it goes:
/Could those of you running ECC memory give me an updated figure on the
number of errors detected/corrected per day per system? /
We are working on self-healin