Joe Landman <land...@scalableinformatics.com> wrote: > > There are two that I know of ... memtest and memtest86, one of which is > a fork of the other. While I like both for coarse testing, we run a > bunch of GAMESS runs to burn nodes in. Some folks like HPL for this. I > like large dense matrix computations that pound on the memory subsystem.
It's an interesting question: why don't the common memory testers catch memory failures that user code does? One would think that the folks who maintain these programs would be trying very hard to emulate the loads that real code places on a system. But consider the differences. 1. One limitation of memtest86+, at least the last time I looked, was that it only used a single core in a multiple CPU system. The tests Joe describes above are going to be banging away on all cores at once. Since memtest86+ is in a sense its own operating system, getting it to run on multiple cores would require one heck of a lot of code to be added. So much so that it probably becomes easier to just boot linux and run the memory tester as a standard application, or as Joe says, just run the actual applications. 2. Most of the modes in most memory testers (generalizing much?) are in some sense sequential. That is, they tend to go through memory in a fixed order, this is not always strictly linear, but is rarely (ever?) as random as the end user test codes may be. Consequently, they tend not to find failure modes that correspond to multiple memory operations on memory cells at peculiar geometries and intervals. 3. The memory testers don't exercise anything but the memory. This puts a pretty constant but minimal load on the power supply. Pounding away at the same time on the disks (and to a lesser extent the NIC) puts a large and varying load on the power supply, which most likely results in additional noise on all voltages, which may be enough to trigger memory failures in marginal devices. As an aside, I have often wished I had a "marginal by design" power supply specifically for more realistic stress tests of the electronic components in a bench situation. That is, a power supply that acts with minimal load as if it was under severe time varying load, with respect to noise on the voltages. This would be useful in finding marginal electronic components, not only memory, but motherboards, NICs, and so forth. No one is going to sell such a PSU, but one could make a sort of "brutalizer box" to plug into a system to emulate this. This would be a small(ish) device into which a power supply connector would be plugged. Once powered up, it would apply a wildly and rapidly varying load on all voltage lines. It need not be a particularly complicated circuit. For instance, run a white noise generator off of the 5V line and use that to drive load transistors on all the supply lines. Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf