Am 18.04.2013 16:28, schrieb [email protected]:
Message: 1
Date: Thu, 18 Apr 2013 12:17:47 +0000
From: "Edward Ned Harvey (openindiana)"<[email protected]>
To: Discussion list for OpenIndiana
<[email protected]>
Subject: Re: [OpenIndiana-discuss] Recommendations for fast storage
Message-ID:
<d1b1a95fbdcf7341ac8eb0a97fccc4773bbf3...@sn2prd0410mb372.namprd04.prod.outlook.com>
Content-Type: text/plain; charset="us-ascii"
>From: Timothy Coalson [mailto:[email protected]]
>
>Did you also compare the probability of bit errors causing data loss
>without a complete pool failure? 2-way mirrors, when one device
>completely
>dies, have no redundancy on that data, and the copy that remains must be
>perfect or some data will be lost.
I had to think about this comment for a little while to understand what you
were saying, but I think I got it. I'm going to rephrase your question:
If one device in a 2-way mirror becomes unavailable, then the remaining device
has no redundancy. So if a bit error is encountered on the (now non-redundant)
device, then it's an uncorrectable error. Question is, did I calculate that
probability?
Answer is, I think so. Modelling the probability of drive failure (either
complete failure or data loss) is very complex and non-linear. Also dependent
on the specific model of drive in question, and the graphs are typically not
available. So what I did was to start with some MTBDL graphs that I assumed to
be typical, and then assume every data-loss event meant complete drive failure.
The thing is... Bit Errors can lead to corruption of files, or even to
the loss of a whole pool, without having an additional faulted drive.
Because Bit Errors do not necessarily lead to a drive error. The risk of
a rebuild failing is proportional to the BER of the drives involved, and
it scales by the amount of data moved, given that you don't have further
redundancy left. I agree with previous suggestions made that scrubbing
offers some degree of protection against that issue. It doesn't do away
with the risk when dealing with Bit Errors in a situation that has all
redundancy stripped for some reason. For this aspect, a second level of
redundancy offers a clear benefit.
AFAIU, that was the valid point of the poster raising the controversy
about resilience of a single vdev with multiple redundancy vs. multiple
vdevs with single redundancy.
As much as scrubbing is concerned, it is true that it will reduce the
risk of a bit error rearing precisely during rebuild. However, in cases
where you will deliberately pull redundancy, i.e. for swapping drives
with larger ones, you will want to have a valid backup, and thus you
will have not have too much of WORN data. In either case, it is
user-driven, that is not scrub by itself is pro-active, but it gives the
user a tool to be proactive about WORN data which are indeed those
primarily prone to bit rot.
BR
Sebastian
_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss