On Thu, Feb 26, 2009 at 3:28 AM, john_re john_re-at-fastmail.us |PDX
Linux| <...> wrote:
> Do you use ECC RAM? Do you have any data about failure rates?
>
> I'm evaluating this for a system with 8GB DRAM, &
> http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Errors_and_error_correction
> says
> "Tests[ecc]give widely varying error rates, but about 10-12upset/bit-hr
> is typical, roughly one bit error, per month, per gigabyte of memory.
>
> In most computers used for serious scientific or financial computing and
> as servers, ECC is the rule rather than the exception, as can be seen by
> examining manufacturers' specifications."
>
>
> So, for that data 8GB DRAM is about 8 errors per month, ie about
> one per 3-4 days.
>
> What rates do you have?

Under normal operation in a data center environment on high quality
server-class hardware, I'll see one or two ECC corrected single bit
errors per quarter on hosts with 24GB+ of RAM.  When RAM is on its way
out, those rates go WAY up even though it's still functional.

Under harsher circumstances (e.g. non-data center) the rates are
probably higher, but I have no hard data on them.

After corrupting many TB of data on one of my home systems (yes,
including backups) due to creeping memory problems, I think it will be
a while before I skip ECC for cost reasons again.

  -- Steve
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to