Re: [Beowulf] Parallel Programming Question

Bogdan Costescu Wed, 01 Jul 2009 02:45:44 -0700

On Tue, 30 Jun 2009, Gus Correa wrote:

My answers were given in the context of Amjad's original questions

Sorry, I missed somehow the context for the questions. Still, thethoughts about I/O programming are general in nature, so they wouldapply in any case.

Hence, he may want to follow the path of least resistance, ratherthan aim at the fanciest programming paradigm.

Heh, I have the impression that most scientific software is startedlike that and only if it's interesting enough (f.e. survives the firstgeneration of PhD student(s) working on/with it) and gets developedfurther it has some "fancyness" added to it. ;-)

Nevertheless, what if these 1000 jobs are running on the samecluster, but doing "brute force" I/O through each of their, say, 100processes? Wouldn't file and network contention be larger than ifthe jobs were funneling I/O through a single processor?

The network connection to the NFS file server or some uplinks in anover-subscribed network would impose the limitation - this would be ahard limitation: it doesn't matter if you divide a 10G link between100 or 100000 down-links, it will not exceed 10G in any case; inextreme cases, the switch might not take the load and start droppingpackets. Similar for a NFS file server: it certainly makes adifference if it needs to serve 1 client or 100 simultaneously, butbeyond that point it won't matter too much how many there are (well,that was my experience anyway, I'd be interested to hear about adifferent experience).

Absolutely, but the emphasis I've seen, at least for small clustersdesigned for scientific computations in a small department orresearch group is to pay less attention to I/O that I had the chanceto know about. When one gets to the design of the filesystems andI/O the budget is already completely used up to buy a fastinterconnect for MPI.

That's a mistake that I have also done. But one can learn from ownmistakes or can learn from the mistakes of others. I'm now trying tohelp others understand that the cluster is not only about CPU or MPIperformance, but about the whole, including storage. So, spread theword :-)

> [ parallel I/O programs ] always cause a problem when the number of> processors is big.
Sorry, but I didn't say parallel I/O programs.

No, that was me trying to condense your description in a few words toallow for more clipping - I have obviously failed...

The opposite, however, i.e., writing the program expecting thecluster to provide a parallel file system, is unlikely to scale wellon a cluster without one, or not?

I interpret your words (maybe again mistakenly) as a general remarkand I can certainly find cases where the statement is false. If youhave a well thought-out network design and a NFS file server that cantake the load, a good scaling could still be achieved - please notehowever that I'm not necessarily referring to a Linux-based NFS fileserver, an "appliance" (f.e. from NetApp or Isilon) could take thatrole as well although at a price.

If you are on a shoestring budget, and your goal is to do parallelcomputing, and your applications are not particularly I/O intensive,what would you prioritize: a fast interconnect for MPI, or hardwareand software for a parallel file system?

A balanced approach :-) It's important to have a clear idea of what"not particularly I/O intensive" actually means and how much value theusers give for the various tasks that would run on the cluster.

Hopefully courses like yours will improve this. If I could, I wouldlove to go to Heidelberg and take your class myself!

Just to make this clearer: I wasn't teaching myself; based on UniHeidelberg regulations, I'd need to hold a degree (like Dr.) to beallowed to do teaching. But no such restrictions are present for thepractical work, which is actually the part that I find mostinteresting because it's more "meaty" and a dialogue with students cantake place ;-)


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Parallel Programming Question

Reply via email to