Leif Nixon wrote: > Mark Hahn <[EMAIL PROTECTED]> writes: > >>> If they find an inconsistent stripe they don't try to identify the >>> corrupt block. Instead, they dumbly *recompute P and Q*, which of >>> course makes the stripe consistent, but *leaves the corrupt data in >>> place*. >> I'm skeptical that MD would be so willfully stupid. > > Me too, but I'm looking at a trashed test file at this very moment. > >> did you conclude this from reading the code, empirical testing, or >> from the author? > > Empirical testing, using raid 6 over file backed loop devices. I
RAIDS in general depend on two things: #1 When you ask for a write and do not get an error, the write happened. #2 that corruptions in the media don't happen That sounds bad, but drives are pretty reliable, have per sector checksums, it's pretty unlikely to have a corrupted sector still manage to produce the correct checksum. For this reason using dd to damage a single disk of a raid will not work, since all the sector checksums will be correct and this will corrupt the RAID set. While continuously reading from all drives and doing the parity calculation isn't practical, I do agree that the scrub (which is fairly new btw) should do this. While rare in practice (a corruption that has the correct block ECC) there's no reason for scrub to not handle this correctly. On the Seagate ES drive I checked each sector is protected by a 10 bit of ECC and they claim 1 non-recoverable error per 10E15 bits. > *hope* there is something wrong with our methodology. We'll need to do > a proper summary of our findings and raise the issue on the linux-raid > mailing list. I'll join and watch, thanks. I would be surprised if scrubbing doesn't do the right thing, but linux-raid is the best place to find out. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf