At 02:16 PM 5/13/2008, Håkon Bugge wrote:
At 19:17 13.05.2008, Perry E. Metzger wrote:
So another question is, how can you reliably test any of this stuff?
It isn't like you can reliably induce single bit errors and see if the
hardware catches them. (A special memory module that let you test
would be a wonderful thing, but I've never even heard of such a thing.)


More on upsets..

Here's an interesting paper from Boeing in the late 90s that asserts that a leading cause of these upsets is atmospheric neutrons. Gives rates too.. (see also the link below to the presentation which uses some of this data)

http://www.boeing.com/assocproducts/radiationlab/publications/SEU_at_Ground_Level.pdf

looks like for 4M DRAMs, 1E-12 upset/bit hour is a nice round number (Table 4)
Some data from Fermilab with 160 Gbit of DRAM showed 2.5 upset/day. Extrapolating (always dangerous with these kinds of radiation effects data, but I'll plunge in regardless).. that means a workstation with 4-8 Gbyte of DRAM might see an upset per day.

Any sort of ECC would catch this and correct it, of course.

There is a paper from Gary Swift, here at JPL, that discusses that some radiation induced upsets will be multiple bit errors by their nature (i.e. imagine a bullet tearing through a bunch of memory cells.. more than one gets hit). But this is for Cassini era Solid State Recorders (e.g. early 90s, late 80s components) and, it's in space, where the radiation environment is quite different than terrestrially. Swift & Guertin, "In-Flight Observations of Multiple-Bit Upset in DRAMS", IEEE Trans on Nuc Sci, V47, #6, Dec 2000, pp2386-2391.

The Ladbury presentation from MAPLD2002 I posted the link to yesterday talks about the mechanics of the upset.

A fascinating presentation about upsets in avionics (for planes, not spacecraft) from Boeing is here:
http://www.solarstorms.org/SEUavionics.pdf

Look at slide 11, and you see that the upset rate is 30 times higher at 30,000 ft than sea level. Those of you building clusters for observatories in Atacama might want to pay more attention to upsets than those of us close to sealevel.

Likewise, the upset rate is higher at high latitudes. (Why yes, it's essential that we build that cluster on a tropical island. otherwise it will cost more for ECC ram)

An interesting post on a mailing list:
http://www.cs.york.ac.uk/hise/safety-critical-archive/2001/0140.html

Ladkin discusses some of the potential issues with the Boeing (and other) data.


So there's more about SEUs in memory than anyone on this list ever wanted to know. There's lots more stuff available, although you pretty quickly get into export controlled territory if you are poking at the limits of the technology.

Jim

--------
James Lux, P.E.
Task Manager, SOMD Software Defined Radios
Flight Communications Systems Section
Jet Propulsion Laboratory
4800 Oak Grove Drive, M/S 161-213
Pasadena CA 91109
USA

+1(818)354-2075 phone
+1(818)393-6875 fax



_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to