Re: [Beowulf] Memory stress testing tools.

2010-12-09 Thread David Kewley
Prentice, You only asked for memory testing programs, but I'm going to go a bit further, to make sure some background issues are covered, and to give you some ideas you might not yet have. Some of this is based on a lot of experience with Dell servers in HPC. Some of my background thoughts on d

Re: [Beowulf] Memory stress testing tools.

2010-12-09 Thread Prentice Bisbal
Jon Forrest wrote: > On 12/9/2010 8:08 AM, Prentice Bisbal wrote: > >> So far, mprime appears to be working. I was able to trigger an SBE in 21 >> hours the first time I ran it. I plan on running it repeatedly for the >> next few days to see how well it can repeat finding errors. > > After it fi

Re: [Beowulf] Memory stress testing tools.

2010-12-09 Thread Prentice Bisbal
Jason Clinton wrote: > On Thu, Dec 9, 2010 at 10:08, Prentice Bisbal > wrote: > > I know breakin well. I used it a quite a bit a in 2008 when I was > stress-testing my then-new cluster, and sent some feedback to the > developer at the time (last name Shoemaker

Re: [Beowulf] Memory stress testing tools.

2010-12-09 Thread Jon Forrest
On 12/9/2010 8:08 AM, Prentice Bisbal wrote: > So far, mprime appears to be working. I was able to trigger an SBE in 21 > hours the first time I ran it. I plan on running it repeatedly for the > next few days to see how well it can repeat finding errors. After it finds an error how do you figure

Re: [Beowulf] Memory stress testing tools.

2010-12-09 Thread Prentice Bisbal
On 12/07/2010 06:58 PM, David Mathog wrote: > Try stressapptest. > > http://code.google.com/p/stressapptest/ > > Note that it has a bizarre behavior where no matter how high you set N > it the sum of their CPU usage is always 100%, even though they are not > all running on one core on a multi-core

Re: [Beowulf] Memory stress testing tools.

2010-12-09 Thread Prentice Bisbal
On 12/08/2010 11:47 AM, Jason Clinton wrote: > On Tue, Dec 7, 2010 at 10:54, Prentice Bisbal > wrote: > > Can any of you recommend a good RAM stress testing tool? > > > We have an open source ISO/netboot image that can stress-test using the > latest Linux kernel EDAC

Re: [Beowulf] Memory stress testing tools.

2010-12-09 Thread Prentice Bisbal
On 12/07/2010 04:35 PM, David Mathog wrote: >> True, but this is a multi-user system, so I don't know which user's code >> is triggering the errors, nor do I know what usage pattern causes the >> errors, so I'm looking for something more consistent. Well, I hope it >> will be more consistent. > > T

Re: [Beowulf] Memory stress testing tools.

2010-12-09 Thread Tony Travis
On 07/12/10 16:54, Prentice Bisbal wrote: > Dear Beowulfers, > > Can any of you recommend a good RAM stress testing tool? > > I have a server with 128GB of RAM that keeps reporting single-bit > errors. Every time this happens, I reseat the DIMMS or swap them around, > and then run some large MPI jo