On Fri, Jul 10, 2015 at 9:34 PM, Chris Cappuccio <ch...@nmedia.net> wrote:
> My first impression, offlining the drive after a single chunk failure
> may be too aggressive as some errors are a result of issues other than
> drive failures.

Indeed, it may look as too aggressive, but is my analysis written in
comment correct? I mean: if there is a write error for whatever reason
to one or more chunk(s) and if we completely ignore it since at least
one write succeed, then arrays is in incorrect state where some
drive(s) hold(s) correct data and another drive(s) hold(s) previous
data. Since reading is done in round-robin fashion, then there is a
chance that you will read old data in the future. If this is correct,
then I think it calls for fix.

If you do not like off-lining drive(s) just after 1 failed read, then
perhaps correct may be to restart whole work unit and enforce writing
again? We can even have some threshold where we may stop and consider
the problematic block really not writeable at the end. Is something
like that better solution?

Thanks,
Karel

Reply via email to