Thanks for the info, I'll give tese a try. I guess one of my biggest questions was does this sound like a hardware issue or could my system have gotten hosed from the hard reboots. If it was the latter I would just format and reinstall if it sounds like hardware issues I will run the tests and see if I can find the problem (the system is only 1 year old and I hate to think I have bad componets) Thanks Brad ----- Original Message ----- From: "nate" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, October 29, 2002 1:35 PM Subject: Re: Help with system recovery
> Brad Cramer said: > > > > be going on or how to fix this problem. What should I be looking for in > > log files? Could it be bad RAM? Any help would be greatly appreciated. > > > problems like this are the hardest to track down. There are several > things you can try to narrow it down. > > BEFORE TESTING > =============== > Get a null modem cable, and configure console on serial port on your > machine, if your not sure how to run a search for "linux serial console" > on most any search engine and a buncha hits should come up, connect your > system to another running a terminal emulation package(e.g. minicom) and > log the output to a file(you need to keep the emulation software up > all the time or messages may get lost). > > > Test 1 > ======== > exit out of X, download, compile, and run 2-3 copies of CPUBurn available > here: > http://users.ev1.net/~redelm/ > > for the first few hours keep a close eye on the system, as the website > warns it can cause serious damage to the system if it is not properly cooled, > theres even been a reported case of a power supply burning out. If your > system is properly cooled you should be able to run a lot of CPUburn processes > and the system won't crash or reboot. If it does, stop here. I reccomend > running this for at least 24 hours. Do not use the computer while it > is running or it may skew results. > > Test 2 > ========== > included in the cpuburn package is a memory tester, I reccomend running > this at a different time, but you can run it at the same time. Running > it at the same time may make it difficult to determine what caused > the crash(RAM or CPU). I reccomend running burnBX or burnMMX with the > 'P' option(uses 64MB of ram) and run multiple copies of it(either load > up screen, or load them in the background with &) if you have 512MB of > ram I would load 7 or 8 copies. I reccomend running this test for > about 24 hours as well. As before, I reccomend not using the computer > while this is going on > > > Test 3 > ========== > Get memtest86 from http://www.memtest86.com/ compile it, make the > boot disk, and boot the disk. turn on the advanced tests(see the > documentation). This test will probably take 72 hours or more. > your computer will not be usable while this test is running. > > Test 4 > =========== > Get bonnie++, and run it in a loop, I usually loop it for 72 hours > to test the disk and controller. redirect output to a log file so > you can monitor it. Again I reccomend not using your computer during > this time. > > > Test 5 > ============= > Since your using nvidia, I reccomend checking to make sure AGP is > disabled by checking /proc/driver/nvidia/agp. Also I reccomend > disabling AGP in X, using the option: > > Option "NvAGP" "0" > > in the Device section of your X config, same place where you define > the driver. > > and try using the system(with the serial console on the other computer) > see if it locks up still. > > > Test 6 > =============== > My next suggestion is try another kernel, preferably a 2.2.x kernel > which may be difficult if your using ext3, though you can probably > put the system in ext2 mode while using 2.2.x. I use 2.2.19 on all > my systems and don't have lockups. Not too long ago my nvidia system > rebooted under intensive load but that was tracked down to a failed > fan on the cheap video card which brings me to .. > > > Test 7 > ================ > perhaps the easiest and least intrusive test. open the side of > the case, point a fan(floor fan), at the internals, turn the > fan on medium or high so a ton of air gets blown into the case and > try to use the system, see if it locks up. > > as you can probably see the procedures for tracking down a system > crash isn't easy, or fast..back when I had my Abit BP6 I spend > literally 6 months trying different things to solve the crashes > only to find out later that the board revision I had came with > a defect on the voltage regulators. In the process I spent WAY > more trying to fix the problem then I would of originally if I > had just gone out and bought a dual P2 instead of trying to go > cheap shit with celerons. I bought another board last year the > Asus A7A266 which had even worse problems, something with the > PCI bus or controller created immediate and complete filesystem > curroption on any disk connected to the system. > > Also be sure you have a good quality power supply that provides enough > power for the system. my AMD Athlon 1300 runs off a PC Power & Cooling > TurboCool 425ATX. And it helps a lot if the system is connected to > a battery backup system. Bad power can easily cause lockups and reboots > without warning(such power problems may not be visible otherwise). If > it is a power issue, there may be permanent damage to the system already. > > nate > > > > > -- > To UNSUBSCRIBE, email to [EMAIL PROTECTED] > with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED] > > -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]