Am 18.04.2013 16:28, schrieb [email protected]:
Message: 1
Date: Thu, 18 Apr 2013 12:17:47 +0000
From: "Edward Ned Harvey (openindiana)"<[email protected]>
To: Discussion list for OpenIndiana
        <[email protected]>
Subject: Re: [OpenIndiana-discuss] Recommendations for fast storage
Message-ID:
        
<d1b1a95fbdcf7341ac8eb0a97fccc4773bbf3...@sn2prd0410mb372.namprd04.prod.outlook.com>
        
Content-Type: text/plain; charset="us-ascii"

>From: Timothy Coalson [mailto:[email protected]]
>
>Did you also compare the probability of bit errors causing data loss
>without a complete pool failure?  2-way mirrors, when one device
>completely
>dies, have no redundancy on that data, and the copy that remains must be
>perfect or some data will be lost.
I had to think about this comment for a little while to understand what you 
were saying, but I think I got it.  I'm going to rephrase your question:

If one device in a 2-way mirror becomes unavailable, then the remaining device 
has no redundancy.  So if a bit error is encountered on the (now non-redundant) 
device, then it's an uncorrectable error.  Question is, did I calculate that 
probability?

Answer is, I think so.  Modelling the probability of drive failure (either 
complete failure or data loss) is very complex and non-linear.  Also dependent 
on the specific model of drive in question, and the graphs are typically not 
available.  So what I did was to start with some MTBDL graphs that I assumed to 
be typical, and then assume every data-loss event meant complete drive failure.
The thing is... Bit Errors can lead to corruption of files, or even to the loss of a whole pool, without having an additional faulted drive. Because Bit Errors do not necessarily lead to a drive error. The risk of a rebuild failing is proportional to the BER of the drives involved, and it scales by the amount of data moved, given that you don't have further redundancy left. I agree with previous suggestions made that scrubbing offers some degree of protection against that issue. It doesn't do away with the risk when dealing with Bit Errors in a situation that has all redundancy stripped for some reason. For this aspect, a second level of redundancy offers a clear benefit. AFAIU, that was the valid point of the poster raising the controversy about resilience of a single vdev with multiple redundancy vs. multiple vdevs with single redundancy. As much as scrubbing is concerned, it is true that it will reduce the risk of a bit error rearing precisely during rebuild. However, in cases where you will deliberately pull redundancy, i.e. for swapping drives with larger ones, you will want to have a valid backup, and thus you will have not have too much of WORN data. In either case, it is user-driven, that is not scrub by itself is pro-active, but it gives the user a tool to be proactive about WORN data which are indeed those primarily prone to bit rot.

BR

Sebastian

_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to