Mark Knecht writes:

> Do I just watch the logs looking for problems? I have no way of
> knowing right now whether this was a disk problem that's going to come
> back, a 1 time deal due to power, or something else entirely.
> 
> As these cheap machines that don't use RAID what's the right way to
> go? emerge -e @world and then wait for the next event? Do nothing and
> wait?

Emerge smartmontools, then:

smartctl -h /dev/sda  # get overview of what the drive thinks about itself

smartctl -t short /dev/sda     # start short self test
Wait
smartctl -l selftest /dev/sda  # see results

smartctl -t long /dev/sda      # start long self test
Wait a lot longer
smartctl -l selftest /dev/sda  # see results

You can continue working in the meanwhile, there will be no performance 
impact. You will see something like this in the log:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description   Status              Remaining  LifeTime(hours)  
LBA_of_first_error
# 1  Short offline      Completed without error   00%    2275       -
# 2  Extended offline   Completed without error   00%    2270       -
# 3  Extended offline   Completed without error   00%    1799       -
# 4  Extended offline   Completed without error   00%     197       -
# 5  Extended offline   Completed without error   00%      26       -

I you have a '-' in the right column, the disk has found no errors. If 
there is a number, than it's the position of the first error.

There's also badblocks, this will check every block and output the bad 
ones: badblocks -sv /dev/sda

badblocks -svn /dev/sda will do a read-write test. In case of a bad block, 
the drive should exchange it with a spare one. Maybe this happens already 
in read-only mode, I am not sure.

Also watch for errors in syslog or via dmesg, there should be some when 
bad blocks are being accessed.

        Wonko

Reply via email to