Re: raid 5: switching cache buffer size, 4096 --> 1024. More info.

Ross Vandegrift Sun, 08 Jul 2001 10:36:10 -0700
> difference. The disks are each in removeable containers, which accept
> only 40-pin connectors. The disks are therefore running at UDMA-33 at

Get them out now.  I've done a lot of experimenting with RAID5 over IDE
and these things have been a disaster in most of my testing.  I built
a RADI5 machine out of IBM Deskstar 18G drives when 18G was bleeding
edge.

We originally had some drives in these enclosures, but they caused so
many crashes, we removed them.  It seems that few of them adequately
pass signals and introduce a lot of cable noise.  I've often seen a
vendor's drive test program report a drive as failed and unusable when
placed in a removeable enclosure when the drive would pass all tests
cleanly when installed direct to a ribbon cable.

[pasted from kernel messages]

> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
> hdb: DMA disabled
> ide0: reset: success
>
> (I think these errors are a cable issue, but they do not prevent hda
> working adequately for now)

I'd be more inclined to indicate the removeable enclosures.  I know
it's a lot of trouble, but it would be very beneficial to get rid of
them.

On the other hand these messages can indicate a failing drive.  Usually
they will display endlessly on the screen for a failed drive, but
before a drive fails they often show up occasionally, one at a time.

Now, I'm not familiar with the RAID code at all, but a glance reveals
that this 'switching cache buffer size' message should be followed by
a messages reading 'size now <newsize>'.  I looks like the code is 
spinning in shrink_stripe_cache while remove_hash'ing before it gets
to the printk that would announce this new size.  No, I don't
know why your setup would break, but might it be instructive for
debugging purposes to insert some printk's around line 286 in
raid5.c?  Perhaps if you knew where the kernel was locking there
would be a better chance that someone familiar with the RAID code
could see why this was happening.

Ross Vandegrift
[EMAIL PROTECTED]
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
Re: raid 5: switching cache buffer size, 4096 --> 1024. More info.

Reply via email to