Paul McHale wrote: > Thanks for the info. It sounds like I would be crazy to not use ECC memory.
That all depends on things like the soft failure rate of the memory used and how much of it you have. > I tend to leave the PC running 24/7. That alone might make the ECC worth > while. If there is a non-zero soft failure rate, and all memory has a soft failure rate, then the more memory you have the more likely it is that you will see a correctable failure. A machine with small amount of memory (64 MB?) of good quality at low altitude might go a long time (year?) without any failures. A machine with a large amount of memory (64 GB?) of low quality at high altitude would probably see frequent (weekly?) errors. (Yes, altitude has an effect due to space radiation.) Most machines fall between those extremes and will have an error rate in between too. There are so many variables it is hard to quantify. And in order to hit the failure it must occur either in the program or data that is actually accessed by the system. An error in a free block of ram might not cause any trouble. An error in user space program might crash that program or worse give a silent data corruption. An error in kernel space might crash the system. On Linux I don't have an easy way to measure the soft error rate. (Hints accepted!) But on commercial systems with ECC I see the errors corrected in the logs frequently. Some machines never have errors. Others are always reporting errors. But they are corrected and no one notices. Does that mean everybody needs it? Well, a dual boot desktop machine with 128 MB of ram which reboots daily between Linux and MS I would expect not. So as with most things in life it really all depends and you just have to choose. Having said all of that I will say that all of my servers which run 24/7 all have ECC memory in them and they are very reliable. The disk drive is the least reliable part for me and I use RAID to offset that. ECC is barely more expensive than non-ECC. I always look for ECC when possible[1]. Unfortunately something known as MQH[2] means that in the consumer market ECC is hard to find. As the old joke goes, you don't have to run faster than the bear. You just have to run faster than your buddy also running from the bear. With the prevalence of MS systems the chipset vendors have been making cheaper and less reliable components. Bob [1] http://www.linuxjournal.com/article.php?sid=4247 [2] http://www.linuxjournal.com/modules.php?op=modload&name=NS-lj-issues/issue79&file=modules.php?op=modload&name=NS-lj-issues/issue79&file=4247s2
pgp00000.pgp
Description: PGP signature