Package: kernel-image-2.6.8-2-386 Version: 2.6.8-16 Severity: important Hello,
I've been having the same problem on 3 firewall boxes: after a certain amount of time (days, weeks) the hard drives will either go into read only mode or lock up for good (until a reboot) with I/O error messages. I will report about this this machine, as the others have not been rebooted yet, so they don't work properly yet (although they still forward/filter packets.) All the firewalls use Seagate ST92011A (20GB 2.5") drives and are based on the VIA chipsets (can't confirm if these are identical as one of the firewalls uses a different motherboard and is currently dead (input/output error on any command) until the next reboot. This was a problem when I tried the latest 2.4 kernel in Sarge, then it seemed to go away when I switched to 2.6.8, but is still there, just takes much longer for the fault to occur. I have a feeling it is a Power Management problem, with the drive not waking up from deep sleep (this was proven experimentally with the 2.4 kernel.) At the moment I am testing the hdparm -B 255 'solution'. Otherwise it's the 15 min ls -l / > /dev/null cron job :-S The kernel is from APT the modules loaded are by hotplug - no custom stuff. powermgmt-base is installed, but that's about it. Here is some info: lspci: --- 0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia] (rev 05) 0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia AGP] 0000:00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40) 0000:00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 0000:00:07.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 1a) 0000:00:07.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 1a) 0000:00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40) 0000:00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio Controller (rev 50) 0000:00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 0000:00:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 0000:00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 0000:01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i1 (rev 6a) --- Dmesg error from this machine (with a futile attempt to force a remote reboot - is there a better way?): --- eth1: no IPv6 routers present eth0: no IPv6 routers present apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac) HTB init, kernel part version 3.17 u32 classifier OLD policer on hdc: dma_timer_expiry: dma status == 0x20 hdc: DMA timeout retry hdc: timeout waiting for DMA hdc: status timeout: status=0xd0 { Busy } hdc: drive not ready for command ide1: reset timed-out, status=0x80 hdc: status timeout: status=0x80 { Busy } hdc: drive not ready for command ide1: reset timed-out, status=0x80 end_request: I/O error, dev hdc, sector 33719 end_request: I/O error, dev hdc, sector 33727 end_request: I/O error, dev hdc, sector 33735 end_request: I/O error, dev hdc, sector 33743 end_request: I/O error, dev hdc, sector 33751 end_request: I/O error, dev hdc, sector 33759 end_request: I/O error, dev hdc, sector 33767 end_request: I/O error, dev hdc, sector 33775 end_request: I/O error, dev hdc, sector 33783 end_request: I/O error, dev hdc, sector 33791 end_request: I/O error, dev hdc, sector 33799 end_request: I/O error, dev hdc, sector 33807 end_request: I/O error, dev hdc, sector 33815 end_request: I/O error, dev hdc, sector 33823 end_request: I/O error, dev hdc, sector 33831 end_request: I/O error, dev hdc, sector 33839 end_request: I/O error, dev hdc, sector 33847 end_request: I/O error, dev hdc, sector 33855 end_request: I/O error, dev hdc, sector 33863 end_request: I/O error, dev hdc, sector 33871 end_request: I/O error, dev hdc, sector 33879 end_request: I/O error, dev hdc, sector 33887 end_request: I/O error, dev hdc, sector 33895 end_request: I/O error, dev hdc, sector 18982239 Buffer I/O error on device hdc5, logical block 782337 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17250295 Buffer I/O error on device hdc5, logical block 565844 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17250303 Buffer I/O error on device hdc5, logical block 565845 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17250311 Buffer I/O error on device hdc5, logical block 565846 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17249919 Buffer I/O error on device hdc5, logical block 565797 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17245527 Buffer I/O error on device hdc5, logical block 565248 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17966455 Buffer I/O error on device hdc5, logical block 655364 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 21374327 Buffer I/O error on device hdc5, logical block 655364 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 21374327 Buffer I/O error on device hdc5, logical block 1081348 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 7340167 Buffer I/O error on device hdc1, logical block 917513 lost page write due to I/O error on hdc1 lost page write due to I/O error on hdc1 end_request: I/O error, dev hdc, sector 7340175 Buffer I/O error on device hdc1, logical block 917514 lost page write due to I/O error on hdc1 end_request: I/O error, dev hdc, sector 7340183 end_request: I/O error, dev hdc, sector 7602311 end_request: I/O error, dev hdc, sector 10765550 end_request: I/O error, dev hdc, sector 10765552 end_request: I/O error, dev hdc, sector 15082871 end_request: I/O error, dev hdc, sector 33903 Aborting journal on device hdc1. end_request: I/O error, dev hdc, sector 12786023 end_request: I/O error, dev hdc, sector 12786031 end_request: I/O error, dev hdc, sector 12786039 end_request: I/O error, dev hdc, sector 12786047 Aborting journal on device hdc5. end_request: I/O error, dev hdc, sector 17245527 end_request: I/O error, dev hdc, sector 17250295 end_request: I/O error, dev hdc, sector 17250311 __journal_remove_journal_head: freeing b_committed_data end_request: I/O error, dev hdc, sector 10765554 Aborting journal on device hdc3. ext3_abort called. EXT3-fs abort (device hdc5): ext3_journal_start: Detected aborted journal Remounting filesystem read-only journal commit I/O error EXT3-fs error (device hdc5) in start_transaction: Journal has aborted EXT3-fs error (device hdc5) in start_transaction: Journal has aborted EXT3-fs error (device hdc5) in start_transaction: Journal has aborted EXT3-fs error (device hdc5) in start_transaction: Journal has aborted EXT3-fs error (device hdc5) in start_transaction: Journal has aborted EXT3-fs error (device hdc5) in start_transaction: Journal has aborted ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_jou rnal_get_write_access<2>EXT3-fs error (device hdc1) in ext3_reserve_inode_write: Journal has aborted Remounting filesystem read-only end_request: I/O error, dev hdc, sector 63 EXT3-fs error (device hdc1) in ext3_dirty_inode: Journal has aborted EXT3-fs error (device hdc5) in start_transaction: Journal has aborted ext3_abort called. EXT3-fs abort (device hdc3): ext3_journal_start: Detected aborted journal Remounting filesystem read-only end_request: I/O error, dev hdc, sector 1572959 printk: 10 messages suppressed. Buffer I/O error on device hdc1, logical block 196612 lost page write due to I/O error on hdc1 end_request: I/O error, dev hdc, sector 1572967 Buffer I/O error on device hdc1, logical block 196613 lost page write due to I/O error on hdc1 end_request: I/O error, dev hdc, sector 1572975 Buffer I/O error on device hdc1, logical block 196614 lost page write due to I/O error on hdc1 end_request: I/O error, dev hdc, sector 1572983 Buffer I/O error on device hdc1, logical block 196615 lost page write due to I/O error on hdc1 end_request: I/O error, dev hdc, sector 1572999 Buffer I/O error on device hdc1, logical block 196617 lost page write due to I/O error on hdc1 end_request: I/O error, dev hdc, sector 1573007 Buffer I/O error on device hdc1, logical block 196618 lost page write due to I/O error on hdc1 end_request: I/O error, dev hdc, sector 1573015 end_request: I/O error, dev hdc, sector 1573023 end_request: I/O error, dev hdc, sector 2359591 end_request: I/O error, dev hdc, sector 3932367 end_request: I/O error, dev hdc, sector 4980871 end_request: I/O error, dev hdc, sector 4980935 end_request: I/O error, dev hdc, sector 7340159 end_request: I/O error, dev hdc, sector 7602279 end_request: I/O error, dev hdc, sector 7602319 end_request: I/O error, dev hdc, sector 8126559 end_request: I/O error, dev hdc, sector 8126567 end_request: I/O error, dev hdc, sector 8651263 end_request: I/O error, dev hdc, sector 8651271 end_request: I/O error, dev hdc, sector 9437279 end_request: I/O error, dev hdc, sector 9437295 end_request: I/O error, dev hdc, sector 9437303 end_request: I/O error, dev hdc, sector 12723551 end_request: I/O error, dev hdc, sector 17179991 end_request: I/O error, dev hdc, sector 17180023 end_request: I/O error, dev hdc, sector 17180031 end_request: I/O error, dev hdc, sector 18752895 end_request: I/O error, dev hdc, sector 10796336 EXT3-fs error (device hdc5) in start_transaction: Journal has aborted <-snip-> More of the same EXT3-fs error (device hdc5) in start_transaction: Journal has aborted EXT3-fs error (device hdc5) in start_transaction: Journal has aborted fwbox01:~# dmesh > /dmesg.txt -bash: /dmesg.txt: Read-only file system fwbox01:~# mount /dev/hdc1 on / type ext3 (rw,errors=remount-ro) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/hdc3 on /home type ext3 (rw) /dev/hdc5 on /var type ext3 (rw) /dev/hdc6 on /home/adm type ext3 (rw) usbfs on /proc/bus/usb type usbfs (rw) fwbox01:~# dmesg > /home/dmesg.txt -bash: /home/dmesg.txt: Read-only file system fwbox01:~# mount /dev/hdc1 on / type ext3 (rw,errors=remount-ro) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/hdc3 on /home type ext3 (rw) /dev/hdc5 on /var type ext3 (rw) /dev/hdc6 on /home/adm type ext3 (rw) fwbox01:~# hd hd hdparm fwbox01:~# hdparm -bash: /sbin/hdparm: Input/output error fwbox01:/proc/sys/kernel# cat panic 20 fwbox01:/proc/sys/kernel# cat panic_on_oops 0 fwbox01:/proc/sys/kernel# echo 1 > panic_on_oops -r--r--r-- 1 root root 0 2005-10-21 17:24 version -r--r--r-- 1 root root 0 2005-10-21 17:24 vmstat fwbox01:/proc# echo diemotherfucker > kcore fwbox01:/proc# fwbox01:/proc# fwbox01:/proc# fwbox01:/proc# cat /dev/hdc > kcore cat: /dev/hdc: Input/output error fwbox01:/proc# cat /dev/random > kcore fwbox01:/dev# ls *mem kmem mem fwbox01:/dev# cat /dev/zero > kmem --- The initial DMA error messages appear now and again, and seem to be harmless (but also PM related) then it all goes wrong after a few weeks running. :-( I will be happy to privide more specific information, if you tell me how to get it. I don't really know much about hardware/kernel debugging. HTH, George B. -- System Information: Debian Release: 3.1 Architecture: i386 (i686) Kernel: Linux 2.6.8-2-386 Locale: LANG=en_GB, LC_CTYPE=en_GB (charmap=ISO-8859-1) Versions of packages kernel-image-2.6.8-2-386 depends on: ii coreutils [fileutils] 5.2.1-2 The GNU core utilities ii initrd-tools 0.1.81.1 tools to create initrd image for p ii module-init-tools 3.2-pre1-2 tools for managing Linux kernel mo -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]