Re: [OpenIndiana-discuss] vdev reliability was: Recommendations for fast storage

Sebastian Gabler Thu, 18 Apr 2013 08:24:44 -0700

Am 18.04.2013 16:28, schrieb [email protected]:

Message: 1
Date: Thu, 18 Apr 2013 12:17:47 +0000
From: "Edward Ned Harvey (openindiana)"<[email protected]>
To: Discussion list for OpenIndiana
        <[email protected]>
Subject: Re: [OpenIndiana-discuss] Recommendations for fast storage
Message-ID:
        
<d1b1a95fbdcf7341ac8eb0a97fccc4773bbf3...@sn2prd0410mb372.namprd04.prod.outlook.com>
        
Content-Type: text/plain; charset="us-ascii"

>From: Timothy Coalson [mailto:[email protected]]
>
>Did you also compare the probability of bit errors causing data loss
>without a complete pool failure?  2-way mirrors, when one device
>completely
>dies, have no redundancy on that data, and the copy that remains must be
>perfect or some data will be lost.

I had to think about this comment for a little while to understand what you 
were saying, but I think I got it.  I'm going to rephrase your question:

If one device in a 2-way mirror becomes unavailable, then the remaining device 
has no redundancy.  So if a bit error is encountered on the (now non-redundant) 
device, then it's an uncorrectable error.  Question is, did I calculate that 
probability?

Answer is, I think so.  Modelling the probability of drive failure (either 
complete failure or data loss) is very complex and non-linear.  Also dependent 
on the specific model of drive in question, and the graphs are typically not 
available.  So what I did was to start with some MTBDL graphs that I assumed to 
be typical, and then assume every data-loss event meant complete drive failure.

The thing is... Bit Errors can lead to corruption of files, or even tothe loss of a whole pool, without having an additional faulted drive.Because Bit Errors do not necessarily lead to a drive error. The risk ofa rebuild failing is proportional to the BER of the drives involved, andit scales by the amount of data moved, given that you don't have furtherredundancy left. I agree with previous suggestions made that scrubbingoffers some degree of protection against that issue. It doesn't do awaywith the risk when dealing with Bit Errors in a situation that has allredundancy stripped for some reason. For this aspect, a second level ofredundancy offers a clear benefit.AFAIU, that was the valid point of the poster raising the controversyabout resilience of a single vdev with multiple redundancy vs. multiplevdevs with single redundancy.As much as scrubbing is concerned, it is true that it will reduce therisk of a bit error rearing precisely during rebuild. However, in caseswhere you will deliberately pull redundancy, i.e. for swapping driveswith larger ones, you will want to have a valid backup, and thus youwill have not have too much of WORN data. In either case, it isuser-driven, that is not scrub by itself is pro-active, but it gives theuser a tool to be proactive about WORN data which are indeed thoseprimarily prone to bit rot.


BR

Sebastian

_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] vdev reliability was: Recommendations for fast storage

Reply via email to