Mark Knecht writes: > Do I just watch the logs looking for problems? I have no way of > knowing right now whether this was a disk problem that's going to come > back, a 1 time deal due to power, or something else entirely. > > As these cheap machines that don't use RAID what's the right way to > go? emerge -e @world and then wait for the next event? Do nothing and > wait?
Emerge smartmontools, then: smartctl -h /dev/sda # get overview of what the drive thinks about itself smartctl -t short /dev/sda # start short self test Wait smartctl -l selftest /dev/sda # see results smartctl -t long /dev/sda # start long self test Wait a lot longer smartctl -l selftest /dev/sda # see results You can continue working in the meanwhile, there will be no performance impact. You will see something like this in the log: === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 2275 - # 2 Extended offline Completed without error 00% 2270 - # 3 Extended offline Completed without error 00% 1799 - # 4 Extended offline Completed without error 00% 197 - # 5 Extended offline Completed without error 00% 26 - I you have a '-' in the right column, the disk has found no errors. If there is a number, than it's the position of the first error. There's also badblocks, this will check every block and output the bad ones: badblocks -sv /dev/sda badblocks -svn /dev/sda will do a read-write test. In case of a bad block, the drive should exchange it with a spare one. Maybe this happens already in read-only mode, I am not sure. Also watch for errors in syslog or via dmesg, there should be some when bad blocks are being accessed. Wonko