Leif Nixon wrote:
Reconstruction. With raid 6, you can recover from single-disk
corruption (As opposed to *failures*, where you get read errors from a
disk. Raid 6 can handle two simultaneous disk *failures*.).
See section 4 in:
http://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
I just read it.
Just recalculating the parity blocks does give you a consistent raid
stripe, but destroys your data (unless it actually was one of the
parity blocks that was corrupted).
Er, that's not how I read it at all. To quote:
In the case of data drive corruption, once the faulty drive has been
identified, recover using the P drive in the same way as a one-disk erasure
failure.
So you want to catch these single disk corruptions (data or parity) as soon
as possible so they don't accumulate. In general if you have the redundancy
at the software RAID it seems best not push too hard on the individual drive.
Don't retry excessively (and depend on the per block checksums) or allow long
timeouts. As soon as the error hits do a write (to remap the block), after
all do you trust a drive to read the sector on the 10th time more than you
trust your parity calculations? If the driver error rates gets too high drop
the drive like a hot potato and scream bloody murder so the admin feeds you a
disk asap.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf