Re: [Beowulf] 512 nodes Myrinet cluster Challanges

Robert G. Brown Fri, 28 Apr 2006 16:46:39 -0700

On Fri, 28 Apr 2006, David Kewley wrote:

By the way, the idea of rolling-your-own hardware on a large cluster, and
planning on having a small technical team, makes me shiver in horror.  If
you go that route, you better have *lots* of experience in clusters. and
make very good decisions about cluster components and management methods.
If you don't, your users will suffer mightily, which means you will suffer
mightily too.


I >>have<< lots of experience in clusters and have tried rolling my own
nodes for a variety of small and medium sized clusters. Let me clarify.
For clusters with more than perhaps 16 nodes, or EVEN 32 if you're
feeling masochistic and inclined to heartache:

Don't.

Or you will have a really high probability of being very, very sorry.

16 node clusters I've done "ok" with, in the sense that the problems
were manageable.  >32 node clusters, especially if you encounter ANY
ex post facto problems with the hardware configuration -- including ones
that passed through your original prototyping runs (and yeah, they
exist) -- rapidly descend into circle of hell type experiences.
Expensive ones.  Much more expensive in real money, let alone time, than
just buy nodes from a quality vendor of nodes with a 3-4 year onsite
service contract, so if they break they'll come fix them (but they don't
break -- see word "quality" in the above:-).

Other than thinking that "shiver in horror" is somehow inadequate to
describe the potential for misery, I endorse pretty much everything else
David (and Mark) said -- both these guys know whereof they speak.

   rgb

--
Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[EMAIL PROTECTED]


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] 512 nodes Myrinet cluster Challanges

Reply via email to