For the last several days, I've been experiencing strange lock-ups and crashes, which I suspect may be due to hardware failure, although I'm not sure how to diagnose this further.
I don't think that it's an OS issue, since the problem sometimes occurs at POST, or at least before the bootloader (grub) comes up. The failures seem to cluster; I've had repeated hangs within a few minutes, and then good running for days. I suspect it may be a HDD / controller problem; a little while ago, I didn't get an actual hang (although I had seen several minutes before that) but some applications temporarily stopped responding, and I saw this in syslog: Sep 7 19:36:08 localhost kernel: [ 193.761021] ata1: drained 65536 bytes to clear DRQ. Sep 7 19:36:08 localhost kernel: [ 193.876071] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 7 19:36:08 localhost kernel: [ 193.876077] ata1.00: failed command: READ DMA Sep 7 19:36:08 localhost kernel: [ 193.876085] ata1.00: cmd c8/00:e8:51:00:98/00:00:00:00:00/e2 tag 0 dma 118784 in Sep 7 19:36:08 localhost kernel: [ 193.876087] res 40/00:01:01:4f:c2/00:00:00:00:00/a0 Emask 0x4 (timeout) Sep 7 19:36:08 localhost kernel: [ 193.876091] ata1.00: status: { DRDY } Sep 7 19:36:08 localhost kernel: [ 193.876127] ata1: soft resetting link Sep 7 19:36:14 localhost kernel: [ 199.076056] ata1: link is slow to respond, please be patient (ready=0) Sep 7 19:36:18 localhost kernel: [ 203.921020] ata1: SRST failed (errno=-16) Sep 7 19:36:18 localhost kernel: [ 203.921034] ata1: soft resetting link Sep 7 19:36:24 localhost kernel: [ 209.121055] ata1: link is slow to respond, please be patient (ready=0) Sep 7 19:36:28 localhost kernel: [ 213.967058] ata1: SRST failed (errno=-16) Sep 7 19:36:28 localhost kernel: [ 213.967072] ata1: soft resetting link Sep 7 19:36:34 localhost kernel: [ 219.168044] ata1: link is slow to respond, please be patient (ready=0) Sep 7 19:36:59 localhost kernel: [ 244.977129] ata1.01: link status unknown, clearing UNKNOWN to NONE Sep 7 19:37:00 localhost kernel: [ 245.385606] ata1.00: configured for UDMA/100 Sep 7 19:37:00 localhost kernel: [ 245.385623] ata1: EH complete The last three lines seem to be from when the system began behaving normally again. This certainly looks bad; anyone know what it means? I'm running SMART tests, but so far I haven't seen anything that looks funny there, although I don't really grok the SMART information. The machine is a nearly four year old Acer Aspire laptop. The HDD, as reported by SMART, is: Model Family: Hitachi Travelstar 5K100 Device Model: HTS541060G9AT00 Serial Number: MPB3PAXMG2SR2G Firmware Version: MB3OA60A Celejar -- foffl.sourceforge.net - Feeds OFFLine, an offline RSS/Atom aggregator mailmin.sourceforge.net - remote access via secure (OpenPGP) email ssuds.sourceforge.net - A Simple Sudoku Solver and Generator -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100907202014.26aec005.cele...@gmail.com