On Sat, 2007-06-02 at 20:47 -0700, [EMAIL PROTECTED] wrote: > Date: Fri, 01 Jun 2007 11:24:40 -0400 > From: Wally Edmondson <[EMAIL PROTECTED]> > Subject: Re: [Beowulf] IBRIX Experiences > To: beowulf@beowulf.org > Message-ID: <[EMAIL PROTECTED]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On Thu, 10 May 2007, Ian Reynolds wrote: > > > Hey all -- we're considering IBRIX for a parallel storage cluster > > solution with an EMC Clarion CX3-20 at the center, as well as a handful > > of storage servers -- total of roughly 40 client servers, mix of 32 and > > 64 bit OSs. > > > > Can anyone offer their experiences with IBRIX, good or bad? We have > > worked with gpfs extensively, so any comparisons would also be helpful. > > It looks like you aren't getting many answers your question, Ian. I'll > quickly share > my IBRIX experiences. I have been running IBRIX since late 2004 on around > 540 > diskless clients and 50 regular servers and workstations with 8 segment > servers and a > Fusion Manager connected to a DDN S2A 3000 couplet with 20TB of usable > storage. The > storage is 1Gb FibreChannel to the Segment Servers and it's non-bonded GigE > for > everything else. > > I'll start with the bad, I guess. We had our share of problems with the 1.x > version > of the software in the early days. I suppose all parallel filesystems with > 600 > clients are going to hit bumps. That's what CFS said back then, anyways. > Stability > wasn't a problem, but occasionally a file wouldn't be readable and to fix it > you had > to copy the file, stuff like that. This was no longer an issue beginning > with > version 2.0. You have to get a new build of the software if you want to > change > kernels. Their are two RPMS, one generic for the major kernel number and the > other > specific to your kernel containing some modules. They only support > RHEL/CENTOS and > SLES as far as I know, and SLES was only recently added. I asked about > Ubuntu and > they don't yet support it, which sucks because I would like to use it on some > workstations. Oh, and make sure that the segment servers can always see each > other. > Use at least two links through different switches. We had some bad switch > ports > that caused the segment servers to miss heartbeats. This caused automatic > failovers > to segment servers that also couldn't be seen. This is a disaster. I > thought it was > IBRIX's fault the whole time. Turned out to be intermittent switch port > problems. > It was avoidable with a little bit more planning and a better understanding > of how > the whole thing worked. Redundancy is set up with buddies rather than > globally, so > you tell it that one server should watch some other server's back. It works, > but it > could be a problem if a failing server's buddy is down or a server goes down > while it > owns a failed segment. In either case, some percentage of your files won't > be > accessible until one of the servers is fixed. It hasn't happened to me, but > it is a > possibility. I can bring down four of my eight servers without a problem, > for > instance, but it needs to be the right four. Servers have failed and it has > never > been a problem for me. The running jobs never know the difference. > > Support has been top-notch. Last year, we had a catastrophic storage > controller > failure following a scheduled power outage, major corruption, the works. A > guy at > IBRIX stayed with me all weekend on the phone and AIM. He logged in and > remotely > restored all the files he could (tens of thousands). Apparently he could > have > restored more if I had already been running 2.0 or higher. They know their > product > very well. I'm not sure if I am the right person to compare it to GPFS or > Lustre > since I looked into those products back in 2004 and haven't really researched > them > since. My setup is simple, too, so I only use the basics. The performance > is fine, > using nearly all of my GigE pipes. With more segment servers and faster > storage you > could get some pretty amazing speeds. I don't use the quotas or multiple > interfaces. > Their GUI looks nice at first but you really don't need it because their > command-line tools make sense and have excellent help output if you forget > something. > Adding new clients is a breeze. There is a Windows client now but I > haven't used > it. I use CIFS exports and it works just fine. I also use NFS exports for > my few > remaining Solaris clients. Everything is very customizable and the > documentation > seems pretty thorough. You can put any storage you like behind it, which is > nice. I > think I could use USB keys if I felt like it. I have been very please with > IBRIX > overall, especially since we upgraded out of 1.x land. It's usually the last > thing > on my mind, so I guess that's a good thing. That's all I have time for right > now. > Let me know if you have any specific questions. > > Wally >
I would agree with some of this. The support is indeed top notch, but our switch to 2.x wasn't as smooth. we have had some problems with files not writing and some performance issues. this is being used on 520 nodes. For us, alot of our (recent) problems have been related to ibrix. Ibrix has been very good about helping fix things. I have had the same experience with ibrix being there when i needed them. when i have a problem, they work on it until fixed regardless of whether it is nighttime or weekends. At this point, i think we are stable and you probably would not have the same issues on a new system. -- Naveed Near-Ansari California Institute of Technology Division of Geology and Planetary Sciense _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf