[OpenIndiana-discuss] vdev reliability was: Recommendations for fast storage

Sebastian Gabler Thu, 18 Apr 2013 03:47:42 -0700

Am 18.04.2013 03:09, schrieb [email protected]:

Message: 1
Date: Wed, 17 Apr 2013 13:21:08 -0600
From: Jan Owoc<[email protected]>
To: Discussion list for OpenIndiana
        <[email protected]>
Subject: Re: [OpenIndiana-discuss] Recommendations for fast storage
Message-ID:
        <cadcwueyc14mt5agkez7pda64h014t07ggtojkpq5js4s279...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8


On Wed, Apr 17, 2013 at 12:57 PM, Timothy Coalson<[email protected]>  wrote:

>On Wed, Apr 17, 2013 at 7:38 AM, Edward Ned Harvey (openindiana) <
>[email protected]> wrote:
>

>>You also said the raidz2 will offer more protection against failure,
>>because you can survive any two disk failures (but no more.)  I would argue
>>this is incorrect (I've done the probability analysis before).  Mostly
>>because the resilver time in the mirror configuration is 8x to 16x faster
>>(there's 1/8 as much data to resilver, and IOPS is limited by a single
>>disk, not the "worst" of several disks, which introduces another factor up
>>to 2x, increasing the 8x as high as 16x), so the smaller resilver window
>>means lower probability of "concurrent" failures on the critical vdev.
>>  We're talking about 12 hours versus 1 week, actual result of my machines
>>in production.
>>

>
>Did you also compare the probability of bit errors causing data loss
>without a complete pool failure?  2-way mirrors, when one device completely
>dies, have no redundancy on that data, and the copy that remains must be
>perfect or some data will be lost.  On the other hand, raid-z2 will still
>have available redundancy, allowing every single block to have a bad read
>on any single component disk, without losing data.  I haven't done the math
>on this, but I seem to recall some papers claiming that this is the more
>likely route to lost data on modern disks, by comparing bit error rate and
>capacity.  Of course, a second outright failure puts raid-z2 in a much
>worse boat than 2-way mirrors, which is a reason for raid-z3, but this may
>already be a less likely case.

Richard Elling wrote a blog post about "mean time to data loss" [1]. A
few years later he graphed out a few cases for typical values of
resilver times [2].

[1]https://blogs.oracle.com/relling/entry/a_story_of_two_mttdl
[2]http://blog.richardelling.com/2010/02/zfs-data-protection-comparison.html

Cheers,
Jan

Notably, Richard's models posted do not include BER. Nevertheless it'san important factor. From the back of my mind it will impact reliabilityin different ways in ZFS:


- Bit error in metadata (zfs should save us by metadata redundancy)
- Bit error in full stripe data
- Bit error in parity data

AFAIK, a bit error in Parity or stripe data can be specificallydangerous when it is raised during resilvering, and there is only onelayer of redundancy left. OTOH, BER issues scale with VDEV size, notwith rebuild time. So, I think that Tim actually made up a valid pointabout a systematically weak point of 2-way mirrors or raidz1 on in vdevsthat are large in comparison to the BER rating of their member drives.Consumer drives have a BER of 1:10^14..10^15, Enterprise drives start at1:10^16.I do not think that zfs will have better resilience against rot ofparity data than conventional RAID. At best, block level checksums canhelp raise an error, so you know at least that something went wrong. Butrecovery of the data will probably not be possible. So, in my opinionBER is an issue under ZFS as anywhere else.


Best,

Sebastian

PS: I occurred to me that WD doesn't publish BER data for some of theirdrives (at least all I have searched for while writing this). Anybodyhappens to be in possession of full specs for WD drives?


_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

[OpenIndiana-discuss] vdev reliability was: Recommendations for fast storage

Reply via email to