We currently have 84 disks; 72 internal, 12 in a jbod.

The internal drives are named: (Front|Back)Row([1-6])Column([1-4])(Near|Far)

(e.g. FrontRow1Column4Far )

We use /etc/zfs/vdev_id.conf to name them that way, by path, to ensure that 
they stay where they are put.

Our jbod disks are named JBOD(\d+)Disk(\d+)  - currently we have 1 jbod with 12 
disks.

Currently all our disks are in sets of 6 RaidZ2s; they were added 1 set at a 
time.

When we got to 66 disks, we had some weirdness, and I thought I might have to 
rebuild it - at that time, 66 disks. Since we can't remove drives from the 
array (except redundant disks), I decided to create a new storage pool. Also, 
this means if we do have a catastrophic failure (e.g. lose 3 disks from the 
same raidz2), we still have some data. Then, when we got the JBOD, there was 
talk of moving this around, so I put this in a separate pool too.

How does reliability scale as I add more redundancy?
For example, if I have 36 disks (lowest common denominator), what is the 
reliability of:

6 stripes of 6 disks as raidz2
4 stripes of 9 disks as raidz3
2 stripes of 12 disks as raidz3

Jort

On 07/03/16 18:06, Fred Liu wrote:



Jort Bloem



Technical Engineer  -  Auckland

Business Technology Group LTD



p: +64 9 580 1374 x9884

m: +64 21 326 000

[email protected]

2016-03-06 22:49 GMT+08:00 Richard Elling 
<[email protected]<mailto:[email protected]>>:

On Mar 3, 2016, at 8:35 PM, Fred Liu 
<<mailto:[email protected]>[email protected]<mailto:[email protected]>> wrote:

Hi,

Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC RAID 
introduction,
the interesting survey -- the zpool with most disks you have ever built popped 
in my brain.

We test to 2,000 drives. Beyond 2,000 there are some scalability issues that 
impact failover times.
We’ve identified these and know what to fix, but need a real customer at this 
scale to bump it to
the top of the priority queue.

[Fred]: Wow! 2000 drives almost need 4~5 whole racks!

For zfs doesn't support nested vdev, the maximum fault tolerance should be 
three(from raidz3).

Pedantically, it is N, because you can have N-way mirroring.

[Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works in 
theory and rarely happens in reality.

It is stranded if you want to build a very huge pool.

Scaling redundancy by increasing parity improves data loss protection by about 
3 orders of
magnitude. Adding capacity by striping reduces data loss protection by 1/N. 
This is why there is
not much need to go beyond raidz3. However, if you do want to go there, adding 
raidz4+ is
relatively easy.

[Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of 2000 
drives. If that is true, the possibility of 4/2000 will be not so low.
          Plus, reslivering takes longer time if single disk has bigger 
capacity. And further, the cost of over-provisioning spare disks vs raidz4+ 
will be an deserved
           trade-off when the storage mesh at the scale of 2000 drives.

Thanks.

Fred


--

<mailto:[email protected]>[email protected]<mailto:[email protected]>
+1-760-896-4422<tel:%2B1-760-896-4422>




openzfs-developer | Archives<https://www.listbox.com/member/archive/274414/=now> [X] 
<https://www.listbox.com/member/archive/rss/274414/28015230-d0d3d327>  | 
Modify<https://www.listbox.com/member/?&;> Your Subscription  [X] <http://www.listbox.com>




-------------------------------------------
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com

Reply via email to