Brad Cramer said: > > be going on or how to fix this problem. What should I be looking for in > log files? Could it be bad RAM? Any help would be greatly appreciated.
problems like this are the hardest to track down. There are several things you can try to narrow it down. BEFORE TESTING =============== Get a null modem cable, and configure console on serial port on your machine, if your not sure how to run a search for "linux serial console" on most any search engine and a buncha hits should come up, connect your system to another running a terminal emulation package(e.g. minicom) and log the output to a file(you need to keep the emulation software up all the time or messages may get lost). Test 1 ======== exit out of X, download, compile, and run 2-3 copies of CPUBurn available here: http://users.ev1.net/~redelm/ for the first few hours keep a close eye on the system, as the website warns it can cause serious damage to the system if it is not properly cooled, theres even been a reported case of a power supply burning out. If your system is properly cooled you should be able to run a lot of CPUburn processes and the system won't crash or reboot. If it does, stop here. I reccomend running this for at least 24 hours. Do not use the computer while it is running or it may skew results. Test 2 ========== included in the cpuburn package is a memory tester, I reccomend running this at a different time, but you can run it at the same time. Running it at the same time may make it difficult to determine what caused the crash(RAM or CPU). I reccomend running burnBX or burnMMX with the 'P' option(uses 64MB of ram) and run multiple copies of it(either load up screen, or load them in the background with &) if you have 512MB of ram I would load 7 or 8 copies. I reccomend running this test for about 24 hours as well. As before, I reccomend not using the computer while this is going on Test 3 ========== Get memtest86 from http://www.memtest86.com/ compile it, make the boot disk, and boot the disk. turn on the advanced tests(see the documentation). This test will probably take 72 hours or more. your computer will not be usable while this test is running. Test 4 =========== Get bonnie++, and run it in a loop, I usually loop it for 72 hours to test the disk and controller. redirect output to a log file so you can monitor it. Again I reccomend not using your computer during this time. Test 5 ============= Since your using nvidia, I reccomend checking to make sure AGP is disabled by checking /proc/driver/nvidia/agp. Also I reccomend disabling AGP in X, using the option: Option "NvAGP" "0" in the Device section of your X config, same place where you define the driver. and try using the system(with the serial console on the other computer) see if it locks up still. Test 6 =============== My next suggestion is try another kernel, preferably a 2.2.x kernel which may be difficult if your using ext3, though you can probably put the system in ext2 mode while using 2.2.x. I use 2.2.19 on all my systems and don't have lockups. Not too long ago my nvidia system rebooted under intensive load but that was tracked down to a failed fan on the cheap video card which brings me to .. Test 7 ================ perhaps the easiest and least intrusive test. open the side of the case, point a fan(floor fan), at the internals, turn the fan on medium or high so a ton of air gets blown into the case and try to use the system, see if it locks up. as you can probably see the procedures for tracking down a system crash isn't easy, or fast..back when I had my Abit BP6 I spend literally 6 months trying different things to solve the crashes only to find out later that the board revision I had came with a defect on the voltage regulators. In the process I spent WAY more trying to fix the problem then I would of originally if I had just gone out and bought a dual P2 instead of trying to go cheap shit with celerons. I bought another board last year the Asus A7A266 which had even worse problems, something with the PCI bus or controller created immediate and complete filesystem curroption on any disk connected to the system. Also be sure you have a good quality power supply that provides enough power for the system. my AMD Athlon 1300 runs off a PC Power & Cooling TurboCool 425ATX. And it helps a lot if the system is connected to a battery backup system. Bad power can easily cause lockups and reboots without warning(such power problems may not be visible otherwise). If it is a power issue, there may be permanent damage to the system already. nate -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]