-- snipped some good advice -- >> However, I do not understand what happens when you have >> multi-processor/multi-core nodes in a cluster. Do you just use MPI >> (with each thread using its own non-shared memory) or is there any >> way to do "mixed-mode" programming which takes advantage of shared >> memory within a node (like, an MPI/OpenMP hybrid?). > > The first is the easiest. MPI takes advantage of shared memory within > the node. > > The hybrid model is a lot more work for the programmer, and often is > slower than pure MPI. And it hurts interconnect performance because you > usually end up with just 1 core driving the interconnect. >
This is a non-obvious result many find hard to believe. That is, MPI on the same node maybe faster than some shared/threaded mode. (of course it all depends on the application etc.) Furthermore, in some recent NAS parallel runs on quad-core Xeons (dual socket MB, 8 cores per MB), LAM-MPI/tcp did better than LAM-MPI/sysv or LAM-MPI/usysv (I have not done any tuning to see if it helps, I should have the hardware back soon though, not allowed to give hard numbers just yet, sorry). Furthermore, hybrid models also start becoming very hardware specific and if the pay-off is not that great, then you *may* have spent a lot of time making your code less portable. These are very good questions by the way, multi-core is changing some things. -- Doug _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf