Re: [Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes

Joshua Baker-LePain Wed, 02 Sep 2009 21:00:10 -0700

On Wed, 2 Sep 2009 at 10:29pm, Rahul Nabar wrote

That brings me to another important question. Any hints on speccing
the head-node? Especially the kind of storage I put in on the head
node. I need around 1 Terabyte of storage. In the past I've uses
RAID5+SAS in the server. Mostly for running jobs that access their I/O
via files stored centrally.


For muscle I was thinking of a Nehalem E5520 with 16 GB RAM. Should I
boost the RAM up? Or any other comments. It is tricky to spec the
central node.

Or is it more advisable to go for storage-box external to the server
for NFS-stores and then figure out a fast way of connecting it to the
server. Fiber perhaps?

Speccing storage for a 300 node cluster is a non-trivial task and isheavily dependent on your expected access patterns. Unless you anticipatevanishingly little concurrent access, you'll be very hard pressed toservice a cluster that large with a basic Linux NFS server. About a yearago I had ~300 nodes pointed at a NetApp FAS3020 with 84 spindles of 10KRPM FC-AL disks. A single user could *easily* flatten the NetApp (read:100% CPU and multi-second/minute latencies for everybody else) withouteven using the whole cluster.

Whatever you end up with for storage, you'll need to be vigilant regardinguser education. Jobs should store as much in-process data as they can onthe nodes (assuming you're not running diskless nodes) and large jobsshould stagger their access to the central storage as best they can.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes

Reply via email to