Couple of points for you to think about.

Just today (like, three minutes before I got to this message)
I read in the Beowulf mailing list the following:

"My first Athlon Asustec K7M system died a few days ago (system froze
up under Linux, started beeping frantically, had to be shut down via
the power supply switch and gave no video signal nor booted after
power cycling). 

Isolated occurence, or a pattern?"

So, if you are running that, you might want to be aware that some
folks are starting to have problems with Athalons, too... of course,
he hasn't troubleshot the system to identify what died, so it may
have nothing to do with the CPU or MB.

As for the K6-2 system.... I've found (from experience, unfortunately)
that Motherboards tend to give me more problems than CPUs... I've
only had one (ever) CPU go bad on me (a Cyrix PR200+, last year), 
and two (ever) memory modules of ANY kind (and these include the 
old days, back in 1982, when they were individual chips hand inserted
into the sockets) go bad... one bare chip, one 32 pin simm.  On the other
hand, I've had a 386-33 MB go bad (burned out the connection to the
simms after three years), my P133 MB (intermittent problems with 
stability after two years), the MB for the Cyrix I just mentioned (the
CPU and MB both failed together, so one took the other, I think),
the motherboard I bought a year and a half ago to replace the P133 MB
(caveat... that MB was damaged by a faulty Matrox Millenium Video card...
it was working fine until the video card went bad (overheated BADLY...
burned my hand when I touched it) and then slowly died literally from
that point until I replaced it last month), and my wifes (nearly new!)
AOpen AX5T MB (only one I can tell you type on from here... I'm in the
process of RMAing it, since I just got it in March, to replace the
PR200+ MB).  That's 5 Motherboards, one video card, three Nics (all because
of the video card, though), two pieces of memory, one simm, one chip, and
one CPU.

The 386 MB you might find interesting, though... the actual symptoms
were something like what you were experiencing.  When I turned on the
system, I could count on the fact it would lock up between 8 and 10 minutes
after boot, would only respond to a hard reboot, and from there would work
anywhere from an hour to a day without problems, but would eventually
lock up again... the culprit was of course a broken contact and thermal 
expansion.

Also, I had a similar lock up problem to yours with the motherboard I 
replaced when it finally gave up the ghost.  Sometimes, the system would
lock while booting, sometimes it would run a few hours... but always,
it would lock up.  It ultimately damaged the boot information on my
hard drives along with the partition information... I had to download
a utility to low level test the hard drives to fix it.  Strangely,
the drives came up full of problems until I did a low level write
to wipe the partition information... after that, I rebuilt the partition
information, and the drives are now reporting no errors.

As for how to troubleshoot... if you have access to an additional
Super Socket 7 Machine, I would first swap the memory between the
two machines.  Test that for a few days, then swap the CPUs.  You
should slowly be able to isolate which part is the problem by doing that.
There are also some old DOS utilities (and a few that run under Win95
et al) that can do integrity testing of memory, and can SOMETIMES
do testing of the CPU.  Try checking some of the download sites
like Download.com, etc.  You may need to install DOS or Doze on the 
system to run them, though, but if you're having this much problem,
you're going to rebuild the system image when you are done, anyway.

Bill Ward

-----Original Message-----
From: Charles Galpin [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, December 28, 1999 3:13 PM
To: [EMAIL PROTECTED]
Cc: recipient.list.not.shown; @nswcphdn.navy.mil
Subject: hardware testing


Hi All. this gets a bit long winded, but please bear with me.

The short version is what is the best (most intensive) way to test
hardware integrity, as well as isolate the exact problem?

yesterday I upgraded my box. I bought a new case/mb/cpu/ram and used all
the other existing components.

It's a AMD K7 500 w/ 256MB ram btw :)

I am currently running a kernel compile in a while /bin/true; do loop and
using the box at the same time as a "burn in test". It's been running like
this since yesterday sometime. I'm quite happy.

Oh so I should ahve mentioned that the old box used to lock up pretty   
frequently. I used to attribute it to the fact that I used to have scsi 
warnings on bootup, and felt like the drives were running too hot. Hence
the upgrade to a full tower case with three case fans.

Now, the real reason for this post is that I took the old case/mb/cpu/ram
combo (K6 II 450/ 256mb), stuck in a new ide CDROM, a new ide HD, and two
new netgear nics. My intention is to make this a server for my mom who
lives a 1000 miles away. It needs to be reliable.

Well the old mb/cpu/ram combo locked up on me twice already. I ran the
kernel compile loop on this box as well. It made it overnight, but locked
up sometime this morning. I ran it again, and this time it locked up in
just under 3 hours. I am currently running it again with one of the 128MB
dimms removed.

Nothing in the logs when this occurs.

So, I'd really like to figure out exactly which component needs
replacing. I'm guessing it's likely in the order of ram, then cpu, then
mb? But, I'd also like to be darn sure I test these properly. This really
needs to be reliable. The last box I setup for my mom turned out to have a
memory problem - only found out once it was installed and I was trying to
build something on it. THen lightning took out the modems so I could no
longer remotely admin it (but that's a whole other problem).

I have also just kicked off a program called memtester
http://www.qcc.sk.ca/~charlesc/software/memtester in the hopes of catching
a memory problem.

thanks
charles


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.

Reply via email to