We currently have 84 disks; 72 internal, 12 in a jbod.
The internal drives are named: (Front|Back)Row([1-6])Column([1-4])(Near|Far) (e.g. FrontRow1Column4Far ) We use /etc/zfs/vdev_id.conf to name them that way, by path, to ensure that they stay where they are put. Our jbod disks are named JBOD(\d+)Disk(\d+) - currently we have 1 jbod with 12 disks. Currently all our disks are in sets of 6 RaidZ2s; they were added 1 set at a time. When we got to 66 disks, we had some weirdness, and I thought I might have to rebuild it - at that time, 66 disks. Since we can't remove drives from the array (except redundant disks), I decided to create a new storage pool. Also, this means if we do have a catastrophic failure (e.g. lose 3 disks from the same raidz2), we still have some data. Then, when we got the JBOD, there was talk of moving this around, so I put this in a separate pool too. How does reliability scale as I add more redundancy? For example, if I have 36 disks (lowest common denominator), what is the reliability of: 6 stripes of 6 disks as raidz2 4 stripes of 9 disks as raidz3 2 stripes of 12 disks as raidz3 Jort On 07/03/16 18:06, Fred Liu wrote: Jort Bloem Technical Engineer - Auckland Business Technology Group LTD p: +64 9 580 1374 x9884 m: +64 21 326 000 [email protected] 2016-03-06 22:49 GMT+08:00 Richard Elling <[email protected]<mailto:[email protected]>>: On Mar 3, 2016, at 8:35 PM, Fred Liu <<mailto:[email protected]>[email protected]<mailto:[email protected]>> wrote: Hi, Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC RAID introduction, the interesting survey -- the zpool with most disks you have ever built popped in my brain. We test to 2,000 drives. Beyond 2,000 there are some scalability issues that impact failover times. We’ve identified these and know what to fix, but need a real customer at this scale to bump it to the top of the priority queue. [Fred]: Wow! 2000 drives almost need 4~5 whole racks! For zfs doesn't support nested vdev, the maximum fault tolerance should be three(from raidz3). Pedantically, it is N, because you can have N-way mirroring. [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works in theory and rarely happens in reality. It is stranded if you want to build a very huge pool. Scaling redundancy by increasing parity improves data loss protection by about 3 orders of magnitude. Adding capacity by striping reduces data loss protection by 1/N. This is why there is not much need to go beyond raidz3. However, if you do want to go there, adding raidz4+ is relatively easy. [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of 2000 drives. If that is true, the possibility of 4/2000 will be not so low. Plus, reslivering takes longer time if single disk has bigger capacity. And further, the cost of over-provisioning spare disks vs raidz4+ will be an deserved trade-off when the storage mesh at the scale of 2000 drives. Thanks. Fred -- <mailto:[email protected]>[email protected]<mailto:[email protected]> +1-760-896-4422<tel:%2B1-760-896-4422> openzfs-developer | Archives<https://www.listbox.com/member/archive/274414/=now> [X] <https://www.listbox.com/member/archive/rss/274414/28015230-d0d3d327> | Modify<https://www.listbox.com/member/?&> Your Subscription [X] <http://www.listbox.com> ------------------------------------------- openzfs-developer Archives: https://www.listbox.com/member/archive/274414/=now RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa Modify Your Subscription: https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c Powered by Listbox: http://www.listbox.com
