On 2 Jul 2008, at 8:26 am, Carsten Aulbert wrote:

OK, we have 1342 nodes which act as servers as well as clients. Every
node exports a single local directory and all other nodes can mount this.

What we do now to optimize the available bandwidth and IOs is spread
millions of files according to a hash algorithm to all nodes (multiple
copies as well) and then run a few 1000 jobs opening one file from one
box then one file from the other box and so on. With a short autofs
timeout that ought to work. Typically it is possible that a single
process opens about 10-15 files per second, i.e. making 10-15 mounts per
second. With 4 parallel process per node that's 40-60 mounts/second.
With a timeout of 5 seconds we should roughly have 200-300 concurrent
mounts (on average, no idea abut the variance).

Please tell me you're not serious! The overheads of just performing the NFS mounts are going to kill you, never mind all the network traffic going all over the place.

Since you've distributed the files to the local disks of the nodes, surely the right way to perform this work is to schedule the computations so that each node works on the data on its own local disk, and doesn't have to talk networked storage at all? Or don't you know in advance which files a particular job is going to need?

Tim


--
The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to