Re: [Beowulf] Curious about ECC vs non-ECC in practice

2011-05-24 Thread Lux, Jim (337C)
> -Original Message- > From: David Mathog [mailto:mat...@caltech.edu] > Sent: Tuesday, May 24, 2011 11:38 AM > To: Lux, Jim (337C); beowulf@beowulf.org > Subject: RE: [Beowulf] Curious about ECC vs non-ECC in practice > > Jim Lux posted: > > > "The Therac-25 Accidents" (Postscript ) or (P

Re: [Beowulf] Curious about ECC vs non-ECC in practice

2011-05-24 Thread David Mathog
Jim Lux posted: > "The Therac-25 Accidents" (Postscript ) or (PDF). This paper is an updated version of the original IEEE Computer (July 1993) article. It also appears in the appendix of my book. Well that was really horrible. Are car computers ECC? When all they did was engine management a me

Re: [Beowulf] Execution time measurements

2011-05-24 Thread David Mathog
Another message from Mikhail Kuzminsky, who for some reason or other cannot currently post directly to the list: BEGIN FORWARD 1st of all, I should mention that the effect is observed only for Opteron 2350/OpenSuSE 10.3. Execution of the same job w/the same binaries on Nehalem E5520/OpenSuSe 11.

Re: [Beowulf] Curious about ECC vs non-ECC in practice

2011-05-24 Thread Lux, Jim (337C)
This *is* a big problem. I suggest reading some of what Nancy Leveson has written. http://sunnyday.mit.edu/ "Professor Leveson started a new area of research, software safety, which is concerned with the problems of building software for real-time systems where failures can result in loss of li

Re: [Beowulf] Curious about ECC vs non-ECC in practice

2011-05-24 Thread Joe Landman
On 05/24/2011 11:41 AM, David Mathog wrote: > Joe Landman wrote: > >> I am wondering about this for larger systems. > > Your post makes me wonder about ECC in much smaller systems, like > dedicated single computers controlling machinery or medical devices. > Some really nasty things could result fr

Re: [Beowulf] Curious about ECC vs non-ECC in practice

2011-05-24 Thread David Mathog
Joe Landman wrote: > I am wondering about this for larger systems. Your post makes me wonder about ECC in much smaller systems, like dedicated single computers controlling machinery or medical devices. Some really nasty things could result from "move cutting head in X (int32 value) mm" after the