According to Bruce Allen: [...] > > In a system with 24 x 500 GB disks, I would like to have usable storage of > 20 x 500 GB and use the remaining disks for redundancy. What do you > recommend? If I understand correctly I can't boot from ZFS so one or more > of the remaining 4 disks might be needed for the OS. > This is (in my opinion) probably the only real issue with the X4500. The system disk(s) must be with the data disks (since there are "only" 48 disks slots) and the two bootable disks are on the same controller which effectively make this controller a single point of failure (there are easy ways to move the second system disk to another controller, but you still need a working "first" controller to boot).
Using ZFS for "/" is not easily done yet (as far as I know it's only available in OpenSolaris at the moment and it's not even available at installation time), so you need to use SVM (Solaris Volume Manager) if you want to mirror the system disk. The ZFS configurations we use minimize the impact of a single failing controller (which becomes more likely since there are 6 of these). Although in our experience, controller failures are rare on the X4500 (one failure in over a year with a few tens of X4500). The controllers are "simple" SATA controllers, there are probably less likely to fail than more advanced RAID controllers. The most frequent failure are (obviously) disks failure (about 3/week). Below is a X4500 disk tray as seen from "above", the columns are the controllers (they're "physically" that way), the rows are the SCSI targets (for instance the "Sys1" cell which is the first bootable device is -- in Solaris lingo -- c5t0 aka c5t0d0). The "vX" marks in the cells are used to specify membership to a "vdev" (a ZFS "virtual device" which can be a single disk or metadevice, a n-way mirror or a "raidz" volume). Here the vdevs are all raidz. This is the default configuration provided by Sun, which is globally pretty good (in terms of redundancy, reliability and performance): +-----------------------------------------------+ | Controllers | +-----------------------------------------------+ | c5 c4 c7 c6 c1 c0 | +-------------+-----------------------------------------------+ ^ 7 | v1 | v1 | v1 | v1 | v1 | v1 | | -------+-----------------------------------------------+ | 6 | v2 | v2 | v2 | v2 | v2 | v2 | | -------+-----------------------------------------------+ | 5 | v3 | v3 | v3 | v3 | v3 | v3 | | -------+-----------------------------------------------+ D 4 | Sys2 | v4 | v4 | v4 | v4 | v4 | i -------+-----------------------------------------------+ s 3 | v5 | v5 | v5 | v5 | v5 | v5 | k -------+-----------------------------------------------+ s 2 | v6 | v6 | v6 | v6 | v6 | v6 | | -------+-----------------------------------------------+ | 1 | v7 | v7 | v7 | v7 | v7 | v7 | | -------+-----------------------------------------------+ | 0 | Sys1 | v8 | v8 | v8 | v8 | v8 | +-------------------------------------------------------------+ All our machines have 48 disks but we tested a beta version of the X4500 with only 24 disks a bit more than one year ago. Only the "lower" 24 disk slots were populated (disks on rows 0 to 3). I'm not sure how the system handles system disk failure in such a case, since the second bootable disk slot is empty, but you could use something like this: +-----------------------------------------------+ | Controllers | +-----------------------------------------------+ | c5 c4 c7 c6 c1 c0 | +-------------+-----------------------------------------------+ ^ 7 | empty | empty | empty | empty | empty | empty | | -------+-----------------------------------------------+ [...] | -------+-----------------------------------------------+ D 4 | empty | empty | empty | empty | empty | empty | i -------+-----------------------------------------------+ s 3 | v1 | v1 | v1 | v1 | v1 | v1 | k -------+-----------------------------------------------+ s 2 | v2 | v2 | v2 | v2 | v2 | v2 | | -------+-----------------------------------------------+ | 1 | v3 | v3 | v3 | v3 | v3 | v3 | | -------+-----------------------------------------------+ | 0 | Sys1 | spare | spare | spare | spare | Sys2 | +-------------------------------------------------------------+ That is only 15x500 GB of usable space, but with lots of security. In order to match your usable space requirement, you can either use two or three (which would be better) of the spare disks in the vdevs but this makes the machine globally less resilient to controller failures. You can also avoid the second system disk altogether and use the last 5 disks on row 0 as a 4th vdev (ZFS allows vdevs to be of differents sizes, even though this requires a '-f' -- "force" -- flag to the "zpool create" call), which would yield 19x500 GB of usable space. It's of course better to have vdevs of similar size but the available space is not limited by the smallest vdev (unlike most RAID-5/RAID-6 implementations). The size difference has a small (but visible) impact on performances but, depending on your I/O workload, you can still get more throughput from the disks than the 4 on-board Gigabit interfaces can handle. According to several Sun engineers, it's also highly recommended to have (raidz) vdevs of only a few disks (less than 10). A more interesting configuration with 24 disks would be: +-----------------------------------------------+ | Controllers | +-----------------------------------------------+ | c5 c4 c7 c6 c1 c0 | +-------------+-----------------------------------------------+ [Empty slots] D -------+-----------------------------------------------+ i 3 | v1 | v1 | v2 | v2 | v3 | v3 | s -------+-----------------------------------------------+ k 2 | v1 | v1 | v2 | v2 | v3 | v3 | s -------+-----------------------------------------------+ | 1 | v1 | v1 | v2 | v2 | v3 | v3 | | -------+-----------------------------------------------+ | 0 | Sys1 | v1 | v2 | spare | v3 | Sys2 | +-------------------------------------------------------------+ ZFS hot-spares are global for a "pool", the disk in "c6t0" can replace any data disk. This gives 4 security (3 parity+1 spare) disks and 3 identically sized vdevs with a usable space of 18x500 GB. A bold configuration would be (starting from the previous one), to use the "spare" and "Sys2" disks in "v2" and "v3" respectively, to get 20x500 GB of usable space but at the expense of no hot-spare or security for the system disk. If I'm not mistaken, Sun now sells (again) X4500 with 250 GB disks. In order to get 10 TB/server, it's probably better (in terms of performance and data security) to have 48x250 GB, although I guess that you plan to buy a "half-full" machine to be able to add 24 larger (and cheaper) disks later. There are several very interesting blogs from Sun engineers about ZFS (linked from <http://blogs.sun.com/main/tags/zfs>). For instance this entry deals with the balance between data security and available (raw) space using some hard disk drives reliability figures: <http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl>. Loïc. -- | Loïc Tortay <[EMAIL PROTECTED]> - IN2P3 Computing Centre | _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf