Thanks for your advices. I did not know about smartctl. I bought a new (bigger!) disk, and everything works fine now.
> On Thu, 18 Oct 2012 14:03:22 -0600, > Bob Proulx <b...@proulx.com> wrote: > Valentin Lorentz wrote: > > Oh... you must be right... > > I had some issues with this hard drive last month, but I thought it > > was only on my SWAP partition. > > Sorry for the report. > > The problem is that all of those partitions are on the same disk drive > mechanism and the failures are idiosyncratic where every failure is > uniquely different. But whenever I start to see disk failures I often > see them grow and get worse and very soon affect everything on the > drive. I try to remove the drive from service quickly since many > times for me they have failed shortly thereafter. > > Just to take care of the bug accounting I am going to go ahead and > mark the bug as done since it seems you have found the underlying > problem. But feel free to continue to mail relevant discussion here > as I have done with this message. I am not trying to stop the > conversation. I am only trying to update the accounting. > > If I ran into this problem I would ensure that I have a good backup as > quickly as possible. Then after ensuring my backup of the important > parts of my system I would scan the files that I suspect of having bad > blocks. I might simply cat the files to /dev/null or I might use dd > on the raw /dev/sda device to try to test if I can read all of the > blocks on the disk. Along the way I will monitor the kernel logs and > look to see if it is throwing errors. Often I find serious failures > when I am looking for them directly. > > Just as a few hints about disk drive diagnosis I always install the > 'smartmontools' package. It provides a framework for routine disk > drive self-testing. It only works with spinning disks and not with > the newer SSDs which don't need the same diagnosis. I will also > mention it isn't for VMs either since the underlying host is where > this type of diagnosis should reside. Then I run the selftests > routinely. I also run selftests whenever I suspect a problem. > > # apt-get install smartmontools > > Then you can test the drive explicitly. Here are some useful tidbits: > > # smartctl -i /dev/sda > # smartctl -l error /dev/sda > # smartctl -l selftest /dev/sda > # smartctl -t short /dev/sda > # smartctl -l selftest /dev/sda > # smartctl -t long /dev/sda > # smartctl -l selftest /dev/sda > > I haven't found SMART to be a great predictor but it is useful to > confirm the diagnosis after the fact. But just the same I set up > routine tests. Since it isn't completely obvious let me show my > configuration as a hint: > > In file /etc/smartd.conf I have: > # Monitor all attributes, enable automatic offline data collection, > # automatic Attribute autosave, and start a short self-test every > # weekday between 2-3am, and a long self test Saturdays between > 3-4am. # Ignore attribute 194 temperature change. > # Ignore attribute 190 airflow temperature change. > # On failure run all installed scripts to send email about problems. > /dev/sda -a -o on -S on -s (S/../../[1-5]/03|L/../../6/03) -I 194 > -I 190 -m root -M exec /usr/share/smartmontools/smartd-runner > > In /etc/default/smartmontools I have: > start_smartd=yes > > And then restart smartmontools: > > # service smartmontools restart > > Hope this helps, > Bob
signature.asc
Description: PGP signature