Loic Tortay wrote: > We specifically use a ZFS configuration that is resistant to a single > controller failure. > > With one controller failed (8 disks unavailable) no data is lost. > Of course the machine is then very fragile since a single disk failure > on another controller will lead to data loss.
Ahh... Ok. We can use 4 controllers, and set it up so that we can lose one w/o loss of data (12 drives/controller as compared to 8), though you will see a corresponding decrease in storage capacity. > I think Bruce's initial implied question was, "did you experience > another hardware failure on that machine before the repair that > ultimately led to data loss ?" The answer to that question is no. > > My point regarding the two controllers in your machine, was that with > two controllers you can't have a configuration resistant to a single > controller failure unless you mirror the data (or add optional > controllers). See above. We usually recommend RAIN for this anyway ... the cost of internal / external replication is comparable to RAIN, and RAIN is more resilient by design. In a good RAIN design you isolate failures within hierarchies. > Replacing the mainboard in a X4500 is actually easier than replacing a > PCI-e card. ??? Would take the unit offline. PCIe's can be hot-swapped. Hot swapping MB's??? > You can change the "control module" without taking the machine out of > its rack and there's no (internal) cable to unplug. Ahhh.... > > But in this case I happen to be plain wrong. As I've been told by one > of my coworker in charge of the X4500 operations, the SATA controllers > of the X4500 are not on the mainboard but on the backplane. Changing > the backplane requires more work than changing a PCI-e card. Thats what I had thought. Requires lifting the drives off of the backplane as I remember. >>> The density of the X4500 is also slightly better (48 disks in 4U >>> instead of 5U). > Sorry, you're right. > > I was referring to density in terms of disk slot per rack unit but > forgot to mention it. > > [...] >>> As of today we have 112 X4500, 112U are almost 3 racks which is quite >>> a lot due to our floor space constraints. >> Ok, I am not trying to convert you. You like your Sun boxen, and that >> is great. >> >> I will do a little math. BTW: thats a fairly impressive size floor you >> have there. 112U of x4500 or 112 x4500? >> > We have 112 X4500 in 14 racks. That's almost 2.7 PBytes raw, 1.9 > PBytes usable space. Wow... color me impressed. Thats quite a bit of disk. > According to Sun, we are the largest X4500 user in the world. > We were already last year, since we had one machine more than the Tokyo > Institute of Technology (featured as an "X4500 success story" on Sun > website). Heh ... cool! > > > [my benchmark is larger than yours :-)] Quite possibly. >> What I like are real application tests. We don't see many (enough) of >> them. I think I have seen one customer benchmark over the last 6 years >> that was both real (as in real operating code) that actually stressed an >> IO system to any significant degree. >> > We stopped using IOzone for our tenders a few years ago and moved to a > "model based I/O benchmark" simulating applications I/O workloads. > It's similar to "filebench" from Sun (but simpler) and is used to > test more useful I/O workloads (for instance threads with different > concurrent workloads and a few things that "filebench" does not, like > accessing raw devices -- useful for disk procurements for our HSM or > Oracle cluster). :) > My pointless result was of course mostly due to cache, with 4 threads > each writing 1 Gbyte to 4 existing 2 GBytes files (one file per > thread). The block size used was 128 kBytes, all (random) accesses are > block aligned, the value is the average aggregated throughput of all > threads for a 20 minutes run. I seem to remember being told in a matter of fact manner by someone some time ago, that only 2GB of IO mattered to them (which was entirely cached BTW), so thats how they measured. Caused me some head scratching, but, well, ok. My (large) concern on iozone and related is that it spends most of its time *in cache*. Its funny, if you go look at the disks during the smaller tests, the blinkenlights don't blinken all that often ... (certainly not below 2GB or so). Then again, maybe IOzone should be renamed "cache-zone" :) More seriously, I made some quick source changes to be able to do IOzone far outside cache sizes (and main memory sizes) so I could see what impact this has on the system. It does have a noticable impact, and I report on it in the benchmark report. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: [EMAIL PROTECTED] web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf