Yo Mark!

On Fri, 21 Jun 2013 11:38:00 -0700
Mark Knecht <markkne...@gmail.com> wrote:

> On the read side I'm not sure if I'm understanding your point. I agree
> that a so-designed RAID1 system could/might read smaller portions of a
> larger read from RAID1 drives in parallel, taking some data from one
> drive and some from another drive, and then only take action
> corrective if one of the drives had troubles. However I don't know
> that mdadm-based RAID1 does anything like that. Does it?

It surely does.  I have confirmed that at least monthly since md has
existed in the kernel.

> It seems to me that unless I at least _request_ all data from all
> drives and minimally compare at least some error flag from the
> controller telling me one drive had trouble reading a sector then how
> do I know if anything bad is happening?

Correct.  You cant' tell if you can read something without trying to
read it.  Which is why you should do a full raid rebuild every week.
> 
> Or maybe you're saying it's RAID1 and I don't know if anything bad is
> happening _unless_ I do a scrub and specifically check all the drives
> for consistency?

No.  A simple read will find the problem.  But given it is RAID1 the only
way to be sure to read from both dirves is a raid rebuild.

> I do mdadm scrubs at least once a week. I still do them by hand. They
> have never appeared terribly expensive watching top or iotop but
> sometimes when I'm watching NetFlix or Hulu in a VM I get more pauses
> when the scrub is taking place, but it's not huge.

Which is why you should cron jothem at oh-dark-thirty.
>
> I agree that RAID5 gives you an opportunity to get things fixed, but
> there are folks who lose a disk in a RAID5, start the rebuild, and
> then lose a second disk during the rebuild.

Because they failed to do weekly rebuilds.

> Not that I would ever run the array degraded but that I
> could still tolerate a second loss while the rebuild was happening and
> hopefully get by.

Sadly most people make their RAID5 or RAID6 out of brand new,
consecutively serial numbered drives.  They then get the exactly the
same temp, voltage, humidity, seek stress until they all fail within
days of each other.  I have personally seen 4 of 5 drives in a RAID5
fail within 3 days many times.  Usually on a Friday where the tech
decides the drive replacement can wait until Monday.

Your only protection against a full RAIDx failure is an offsite backup.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
        g...@rellim.com  Tel:+1(541)382-8588

Attachment: signature.asc
Description: PGP signature

Reply via email to