Hi, I just ran into some troubles with the IO system of a new server running Debian Etch (2.6.18-6-vserver-amd64 (Debian 2.6.18.dfsg.1-18etch1)) and was wondering whether these might be related to some SB600 patches which might not yet be part of the Debian kernel.
The problem occurs with both disks and even persists after the _entire_ hardware of the server was replaced. This therefore seems to be unlikely to be a hardware defect of one of the individual disks/motherboard/cables involved. Controller: 00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA ( full lspci -vv: http://pastesite.com/326/123 ) Disks: two SAMSUNG HD403LJ (FW: CT100-12) in a software raid1 ( full smartctl/hdparm output: http://pastesite.com/328/123 ) Complete bootup dmesg output: http://pastesite.com/327/123 The system runs without problems but after a couple of days or weeks of heavy IO load one of the following two situations can occur. a) Error during access to a disk, leading to the step by step degradation of the DMA/PIO mode for the disk. ata2.00: exception Emask 0x40 SAct 0x1 SErr 0x800 action 0x2 frozen ata2.00: tag 0 cmd 0x61 Emask 0x44 stat 0x40 err 0x0 (timeout) ata2: soft resetting port ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ... ata2.00: configured for UDMA/100 ... ata2.00: configured for UDMA/66 ... ... ata2.00: configured for PIO4 All that happens within one second (according to the syslog timestamps). Somewhere in between raid1 gives up and drops the disk from the array. But the disk remains accessible after the degradation and can still be used normally. b) Error during access to disk but the dma/pio mode is not changed. The disk becomes totally inaccessible till the system is rebooted. ata1.00: exception Emask 0x40 SAct 0xf SErr 0x800 action 0x2 frozen ata1.00: tag 0 cmd 0x60 Emask 0x44 stat 0x40 err 0x0 (timeout) ata1.00: tag 1 cmd 0x60 Emask 0x44 stat 0x40 err 0x0 (timeout) ata1.00: tag 2 cmd 0x60 Emask 0x44 stat 0x40 err 0x0 (timeout) ata1.00: tag 3 cmd 0x60 Emask 0x44 stat 0x40 err 0x0 (timeout) ata1: soft resetting port ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete ata1.00: exception Emask 0x0 SAct 0xc SErr 0x0 action 0x2 frozen ata1.00: tag 2 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata1.00: tag 3 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout) ata1: soft resetting port sda, sector 362827470 sd 0:0:0:0: SCSI error: return code = 0x00040000 end_request: I/O error, dev sda, sector 362827470 After that the disk can't be accessed in any way. i.e. # smartctl -d ata -a /dev/sda Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) # hdparm -I /dev/sda /dev/sda: HDIO_DRIVE_CMD(identify) failed: Input/output error # hdparm -w /dev/sda /dev/sda: HDIO_DRIVE_RESET failed: Inappropriate ioctl for device full error output: http://pastesite.com/320/123 http://pastesite.com/329/123 http://pastesite.com/330/123 Has anyone experienced similar problems in the past? I noticed that the SB600 driver has undergone some modifications/patches since 2.6.18. Does anyone know if such driver updates/fixes are usually back ported into the Debian stock kernel? Greetings Hans -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]