I've not tried their respective MPI libraries, but as a general rule, the people who manufacture the chips have the best idea of how to optimize a given library. (There are obvious counter-examples, gotoBLAS and fftw for example).
That said, have you tried for Intel: http://www.intel.com/cd/software/products/asmo-na/eng/308295.htm or for AMD: http://developer.amd.com/devtools.jsp (they link to HP's MPI) As a side note, IBM uses a slightly modified version of MPICH for Blue Gene. Nathan On Nov 28, 2007 9:48 AM, Christian Bell <[EMAIL PROTECTED]> wrote: > But the main point with MPI implementations, more than usual with > shared memory, is to run your application. > > For 2 different MPI shared-memory implementations that show equal > performance on point-to-point microbenchmarks, you can measure very > different performance in applications (mostly at the bandwidth-bound > level). > > Microbenchmarks assume senders and receivers are always synchronized > in time and report memory copy performance for memory copies that go > mostly through the cache. Memory transfers that are mostly out of > cache are rarely tuned for or even measured. > > Microbenchmarks also never have the receivers actually consume the > data that's received or have senders re-reference the data sent for > computation. The cost of these application-level memory accesses is > greatly determined by where in the memory hierarchy the MPI > implementation left the data to be computed on. And finally, a given > implementation will have very different performance characteristics > on Opteron versus Intel, few-core versus many-core and point-to-point > versus collectives. > > It's safe to assume that most if not all MPIs try to do something > about shared memory but I wouldn't be surprised if each of them can > top out on some performance curve on some specific system. > > > . . christian > > On Wed, 28 Nov 2007, amjad ali wrote: > > > Hello, > > > > Because today the clusters with multicore nodes are quite common and the > > cores within a node share memory. > > > > Which Implementations of MPI (no matter commercial or free), make > automatic > > and efficient use of shared memory for message passing within a node. > (means > > which MPI librarries auomatically communicate over shared memory instead > of > > interconnect on the same node). > > > > regards, > > Ali. > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > -- > [EMAIL PROTECTED] > (QLogic Host Solutions Group, formerly Pathscale) > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - -
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf