Thanks for your advices. I did not know about smartctl.

I bought a new (bigger!) disk, and everything works fine now.

> On Thu, 18 Oct 2012 14:03:22 -0600,
> Bob Proulx <b...@proulx.com> wrote:
> Valentin Lorentz wrote:
> > Oh... you must be right...
> > I had some issues with this hard drive last month, but I thought it
> > was only on my SWAP partition.
> > Sorry for the report.
> 
> The problem is that all of those partitions are on the same disk drive
> mechanism and the failures are idiosyncratic where every failure is
> uniquely different.  But whenever I start to see disk failures I often
> see them grow and get worse and very soon affect everything on the
> drive.  I try to remove the drive from service quickly since many
> times for me they have failed shortly thereafter.
> 
> Just to take care of the bug accounting I am going to go ahead and
> mark the bug as done since it seems you have found the underlying
> problem.  But feel free to continue to mail relevant discussion here
> as I have done with this message.  I am not trying to stop the
> conversation.  I am only trying to update the accounting.
> 
> If I ran into this problem I would ensure that I have a good backup as
> quickly as possible.  Then after ensuring my backup of the important
> parts of my system I would scan the files that I suspect of having bad
> blocks.  I might simply cat the files to /dev/null or I might use dd
> on the raw /dev/sda device to try to test if I can read all of the
> blocks on the disk.  Along the way I will monitor the kernel logs and
> look to see if it is throwing errors.  Often I find serious failures
> when I am looking for them directly.
> 
> Just as a few hints about disk drive diagnosis I always install the
> 'smartmontools' package.  It provides a framework for routine disk
> drive self-testing.  It only works with spinning disks and not with
> the newer SSDs which don't need the same diagnosis.  I will also
> mention it isn't for VMs either since the underlying host is where
> this type of diagnosis should reside.  Then I run the selftests
> routinely.  I also run selftests whenever I suspect a problem.
> 
>   # apt-get install smartmontools
> 
> Then you can test the drive explicitly.  Here are some useful tidbits:
> 
>   # smartctl -i /dev/sda
>   # smartctl -l error /dev/sda
>   # smartctl -l selftest /dev/sda
>   # smartctl -t short /dev/sda
>   # smartctl -l selftest /dev/sda
>   # smartctl -t long /dev/sda
>   # smartctl -l selftest /dev/sda
> 
> I haven't found SMART to be a great predictor but it is useful to
> confirm the diagnosis after the fact.  But just the same I set up
> routine tests.  Since it isn't completely obvious let me show my
> configuration as a hint:
> 
> In file /etc/smartd.conf I have:
>   # Monitor all attributes, enable automatic offline data collection,
>   # automatic Attribute autosave, and start a short self-test every
>   # weekday between 2-3am, and a long self test Saturdays between
> 3-4am. # Ignore attribute 194 temperature change.
>   # Ignore attribute 190 airflow temperature change.
>   # On failure run all installed scripts to send email about problems.
>   /dev/sda -a -o on -S on -s (S/../../[1-5]/03|L/../../6/03) -I 194
> -I 190 -m root -M exec /usr/share/smartmontools/smartd-runner
> 
> In /etc/default/smartmontools I have:
>   start_smartd=yes
> 
> And then restart smartmontools:
> 
>   # service smartmontools restart
> 
> Hope this helps,
> Bob

Attachment: signature.asc
Description: PGP signature

Reply via email to