On Thu, Mar 08, 2007 at 08:30:57AM -0800, michael wrote: > Hello, > > Have an etch box that does nothing but rsync data with another. > About every other day or so, the box will completely freeze. > Everything, screen blank, no keyboard, and the hard drive light > is on solid. > I can hard reboot it and it comes up, and there is nothing in the logs that > suggest anything. > The root system is an mdadm raid 5 array, and everytime I reboot > it from a crash, the array is always degraded. It auto rebuilds itself, > and away it goes again. A few days later, it will lock up. > > I have no idea where to start looking for problems. I'm pretty sure its gotta > be hardware, but not sure where to look first. > Any suggestions would be great! >
AIUI, the order of mostly likely-to-least likely failure is: power-supply, hard-drives, memory, other stuff. power-supplies are hard to test without equipment, unless you know you've got sensors set up properly. But, its still worth a shot -- set up lmsensors and look at your voltages. If they're more than +/- 5% from spec then start with a new power-supply. Hard-drives should generally leave some kind of logs right before they go down, and with raid, you shouldn't see a lock-up, unless you're sharing controllers, maybe. If the drives are SMART enabled, then check that out. I think memory errors are pretty much impossible to diagnose through any method other than swapping sticks in a systematic way. good luck A
signature.asc
Description: Digital signature