try running memtest+86 its a cd that you boot on to that tests the memory leave it running for a few hrs to makes sure it is the ram or sockets. i am not sure about how to test the cpu.
On Tue, Jan 13, 2009 at 10:26 AM, Francesco Pietra < francesco.pie...@accademialucchese.it> wrote: > Hi: > > I am posting here from a suggestion on the Debian amd64 site. My > original posting to the mainboard factory/vendor in Europe only > resulted in uninteresting suggestions, and they did not answer any > more. > > My question is directed to the attention of users familiar with > multisocket UMA-type mainboards based on 875 dual opteron AMD CPU. My > own is Supermicro H8QC8 with chipset nVidia CK804 and AMD 8132, driven > by Debian Linux amd64 lenny. > > One of the CPUs has suddenly lost viability to its > 4-slots memory bank (shut down the machine in order, the problem arose on > next > loading Linux). Still, the CPU cores are OK, hypertransport links are > fully working, parallelization to both Amber 10 and NWChem 5.1 is > fully provided, but one of the CPUs must be slower, having to borrow > memory from the other > banks. The hardware status, after a period of complete darkness, is > described in the attached lshw_deb64_7Jan2009.txt. > > As each bank of Kingston DDR1 is filled 2+2+1+1 GB, I identified the > faulty bank, removed all slots from there, and replaced the 1+1 GB > slots at another bank with 2 + 2 GB from the faulty bank, so that now > the computer is at 20GB. The situation is described in the attached > lshw_deb64_lessCPU2_scrambling1G_2G_CPU4_7Jan2009.txt. Actually, > identification of the CPU (CPU2) related to the faulty mem bank is > insecure: I just considered the nearest CPU to the faulty bank. The > manual is not helpful to this regard . > > I understand that, in order to remove non-mainboard causes, I should > be certain that a CPU has not lost memory control. Since replacing (I > have one spare second-hand CPU) or scrambling, the CPUs is quite > troublesome, and risky, in my context (there is very little space > around the mainboard in the rack that I engineered to accept the > mainboard). Ventilation is excellent, however. > > Therefore, is it any software way to check if the CPUs are fully in > order, including the memory controller? lshw and other software > provided only partial help in my hands. > > Also any other suggestion would be greatly appreciated. > > Thanks for your kind attention > > francesco pietra > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf