Paul McHale wrote:
> Thanks for the info.  It sounds like I would be crazy to not use ECC memory.

That all depends on things like the soft failure rate of the memory
used and how much of it you have.

> I tend to leave the PC running 24/7.  That alone might make the ECC worth
> while.

If there is a non-zero soft failure rate, and all memory has a soft
failure rate, then the more memory you have the more likely it is that
you will see a correctable failure.  A machine with small amount of
memory (64 MB?) of good quality at low altitude might go a long time
(year?) without any failures.  A machine with a large amount of memory
(64 GB?) of low quality at high altitude would probably see frequent
(weekly?) errors.  (Yes, altitude has an effect due to space
radiation.)  Most machines fall between those extremes and will have
an error rate in between too.  There are so many variables it is hard
to quantify.

And in order to hit the failure it must occur either in the program or
data that is actually accessed by the system.  An error in a free
block of ram might not cause any trouble.  An error in user space
program might crash that program or worse give a silent data
corruption.  An error in kernel space might crash the system.

On Linux I don't have an easy way to measure the soft error rate.
(Hints accepted!)  But on commercial systems with ECC I see the errors
corrected in the logs frequently.  Some machines never have errors.
Others are always reporting errors.  But they are corrected and no one
notices.

Does that mean everybody needs it?  Well, a dual boot desktop machine
with 128 MB of ram which reboots daily between Linux and MS I would
expect not.  So as with most things in life it really all depends and
you just have to choose.

Having said all of that I will say that all of my servers which run
24/7 all have ECC memory in them and they are very reliable.  The disk
drive is the least reliable part for me and I use RAID to offset that.
ECC is barely more expensive than non-ECC.  I always look for ECC when
possible[1].  Unfortunately something known as MQH[2] means that in
the consumer market ECC is hard to find.  As the old joke goes, you
don't have to run faster than the bear.  You just have to run faster
than your buddy also running from the bear.  With the prevalence of MS
systems the chipset vendors have been making cheaper and less reliable
components.

Bob

[1] http://www.linuxjournal.com/article.php?sid=4247
[2] 
http://www.linuxjournal.com/modules.php?op=modload&name=NS-lj-issues/issue79&file=modules.php?op=modload&name=NS-lj-issues/issue79&file=4247s2

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to