Hiya Mick;
Thanks for your response. Great ideas. I'll pass it along.
-Skippy
On 1/21/2014 11:42 PM, Mick wrote:
On Tuesday 21 Jan 2014 23:11:24 Skippy wrote:
"Have you ever found a program in linux that allows you to locate bad
dims if you have faults? I’ve tried memtest86, memconf, memtester and
none of them can point out what slot on the motherboard has the bad RAM.
memtest86+ is what I use, but I have not found an application that will
identify and report on its own a faulty module or controller out of a whole
bank of them. Press F2 when it starts, to enable SMT support and get the
tests done a bit faster.
I know usually you just plug in one at a time. But memtest86 takes
hours and I have…wait for it….16 slots to test DIMs on for this specific
server with memory failure."
You don't have to test 16 modules one at a time, although you will have to run
the test more than once:
Remove half (8) of the memory modules. Ensure what is left is installed in
the slot combination recommended by the MoBo manufacturer. Test these. If no
fault is found swap them for the other half. As soon as a fault is reported,
remove half of this batch (4) and install the other 4 as recommended by the
MoBo manufacturer. Rinse and repeat. This way you will eventually isolate
the dodgy DIMM module, by running the test fewer than 16 times. Usually
errors show in the first round of tests, but some times you may need to wait
for more than 8 passes.
Before you start any of this it is a good idea to just reseat the modules one
at a time in case you have some dirt or oxidisation in any of the contacts.
That could save a lot of hours ...
Make sure you have marked clearly which batches have showed no errors - if you
mix them up you will have to start from the beginning. I know I am stating
the obvious, but I have been there with colleagues who like to tidy up other
people's work space <sigh>.