Hallo Greg,

Donnerstag, 25. September 2008, meintest Du:


Glen,

I have had great success with the *right* 10GbE nic and NFS.  The important 
things to consider are:


I have to say my experience was different. 



How much bandwidth will your backend storage provide?  2 x FC 4 I'm guessing 
best case is 600Mb but likely less.


600 MB/s is already a good value for a SAN-based Storage ;-)


What access patterns do the "typical apps" have?  
All nodes read from a single file (no prob for NFS, and fscache may help even 
more)  
All nodes write to a single file (NFS may need some help or may be too slow 
when tuned for this)
All nodes read and write to separate files (NFS is fine if the files aren't too 
big for the OS to cache reasonably).

The number of IO servers really is a function of how much disk throughput you 
have on the backend, frontend, and through the kernel/filesystem goo.  My 
experience is a 10GbE nic from Myricom can easily sustain 500-700MB/s if the 
storage behind it can and the access patterns aren't evil.  Other nics


My experience was this: you get app. half of what you have on blockdevice-level 
to the network. So i had a setup with 16 x 15k rpm SAS drives. RAID5 on them 
showed 1.1 GB/s read (limited by PCIe x8 probably) and 550 MB/s write 
(Controller was LSI 8888ELP). With exporting this to a number of clients i was 
not able to get more than app. 500 MB/s read and 400 MB/s write with multiple 
clients. I could show the real measurements if that is of interest. 

If you look at the hardware that was thrown on the problem the result is a 
little pathetic. 

My experience with lustre is that it eats up 10 to 15% of the 
blockdevice-speed. And the rest you have in the network.

So a cheap lustre-setup for scratch would probably include 2 Servers with 
internal storage and exporting it to the cluster with 10GE or IB. Internal 
storage is cheap and it is easy to achieve 500+ MB/s on SATA drives. That way 
you can reach 1 GB/s with just having 2 Servers and 32 to 48 disks involved. 


 from large and small vendors can fall apart at 3-4 Gb so be careful and test 
the network first before assuming your FS is the troublemaker.  There are cheap 
switches with 2 or 4 10GbE CX4 connectors that make this much simpler and safer 
with or without the Parallel FS options.


I never tested anything but Myricom 10GE but you can find cheap Intel-Based 
cards with CX4 (and i doubt that they are bad) . The Dell PowerConnect 
62xx-Series can give you cheap CX4 uplinks - and you get a decent switch that 
is stackable. 



Depending on how big/small and how "scratch" the need is... a big tmpfs/ramdisk 
can be fun :)


I tried once to export tmpfs via NFS - didn't work out of the box.

Bye Jan                            
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to