Bug#293122: RAID1 fails to detect and report genuine hard drive read fault

Peter Sahlmann Tue, 01 Feb 2005 01:23:13 -0800

Package:mdadm
Version:v1.7.0 - 11 August 2004

Distribution: Debian Sarge
Kernel: 2.6.8-1-386
Hardware: AMD64 2800+/ MSI K8N Neo with NVIDEA nForce3 250Gb Chipset
Hard disk: 2 x Maxtor 7Y250P0 (250GB) IDE
Software:  mdadm - v1.7.0 - 11 August 2004
           RAID1 hda and hdc

Hello!

Problem Description: The RAID1 system was originally tested by
simulating individual drive failures.  This was achieved by
disconnecting each drive in turn and running force-fail under
'mdadm' using the --set-faulty option.  Everything worked as
expected.

Then a real disk fault occurred!  A system monitoring tool
reported that '/dev/hda7' had unreadable (pending) sectors.
Later, I tested the suspect hard disk with a maxtor drive test
utility and this confirmed a genuine unrecoverable disk read
error.  The 'md' driver had tried to switch at partition
'/dev/hdc7', but the whole 'hdc' disk was not accessible because
of a "dma_timer_expiry" error.  After the faulty 'hda' hard drive
was replaced, the system rebooted, and RAID1 sync'ed [??] again,
everything works fine again.  A system log and an mdstat dump are
appended.

It would seem that RAID1 failed to detect and report this error as
would be expected in the circumstances.  Is this a correct
assessment ?  Do you have any comments or advice in this case ?
Cheers and any thanks in advance.

Peter Sahlmann
Network Administrator


----------------------------------
Log from /var/log/kern.log
----------------------------------

Jan 20 22:00:41 andros kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 20 22:00:41 andros kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=423121589, high=25, low=3691189, sector=423121589 Jan 20 22:00:41 andros kernel: end_request: I/O error, dev hda, sector 423121589 Jan 20 22:00:41 andros kernel: raid1: Disk failure on hda7, disabling device. Jan 20 22:00:41 andros kernel: ^IOperation continuing on 1 devices Jan 20 22:00:41 andros kernel: raid1: hda7: rescheduling sector 227803256 Jan 20 22:00:41 andros kernel: raid1: hdc7: redirecting sector 227803256 to another mirror Jan 20 22:01:02 andros kernel: hdc: dma_timer_expiry: dma status == 0x60 Jan 20 22:01:02 andros kernel: hdc: DMA timeout retry Jan 20 22:01:02 andros kernel: hdc: timeout waiting for DMA Jan 20 22:01:02 andros kernel: hdc: status error: status=0x58 { DriveReadySeekComplete DataRequest } Jan 20 22:01:02 andros kernel: Jan 20 22:01:02 andros kernel: hdc: drive not ready for command Jan 20 22:01:02 andros kernel: raid1: hdc7: rescheduling sector 227803256 Jan 20 22:01:02 andros kernel: raid1: hdc7: redirecting sector 227803256 to anothermirror Jan 20 22:01:02 andros kernel: hdc: status error: status=0x58 { DriveReadySeekComplete DataRequest } Jan 20 22:01:02 andros kernel: Jan 20 22:01:02 andros kernel: hdc: drive not ready for command Jan 20 22:01:02 andros kernel: hdc: status error: status=0x58 { DriveReadySeekComplete DataRequest }

----------------------------------
mdstat dump
----------------------------------

andros:~# cat /proc/mdstat

Personalities : [raid1]
md0 : active raid1 hda1[0] hdc1[1]
      979840 blocks [2/2] [UU]

md3 : active raid1 hda6[0] hdc6[1]
      48829440 blocks [2/2] [UU]

md4 : active raid1 hda7[0] hdc7[1]
      147452480 blocks [2/2] [UU]

md1 : active raid1 hda2[0] hdc2[1]
      1951808 blocks [2/2] [UU]

md2 : active raid1 hda5[0] hdc5[1]
      45897600 blocks [2/2] [UU]


-----------------------------------
/etc/fstab
-----------------------------------

# /etc/fstab: static file system information.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
/dev/md2        /               ext3    defaults,errors=remount-ro 0       1
/dev/md0        /boot           ext3    defaults        0       2
/dev/md4        /files          ext3    nosuid          0       2
/dev/md3        /home           ext3    defaults        0       2
/dev/md1        none            swap    sw              0       0
/dev/hdd        /media/cdrom   iso9660 ro,user,noauto  0       0
/dev/fd0        /media/floppy0  auto    rw,user,noauto  0       0
/dev/sda1       /mnt/usbstick   vfat    rw,user,noauto,umask=000  0     0

--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#293122: RAID1 fails to detect and report genuine hard drive read fault

Reply via email to