Hallo Håkon,
Freitag, 25. April 2008, meintest Du:
HB> Hi Jan,
HB> At Wed, 23 Apr 2008 20:37:06 +0200, Jan Heichler <[EMAIL PROTECTED]> wrote:
>> >From what i saw OpenMPI has several advantages:
>>- better performance on MultiCore Systems
>>because of good shared-memory-implementation
HB> A couple of months ago, I conducted a thorough
HB> study on intra-node performance of different MPIs
HB> on Intel Woodcrest and Clovertown systems. I
HB> systematically tested pnt-to-pnt performance
HB> between processes on a) the same die on the same
HB> socket (sdss), b) different dies on same socket
HB> (ddss) (not on Woodcrest of course) and c)
HB> different dies on different sockets (ddds). I
HB> also measured the message rate using all 4 / 8
HB> cores on the node. The pnt-to-pnt benchmarks used
HB> was ping-ping, ping-pong (Scali?s `bandwidth´ and
osu_latency+osu_bandwidth).
HB> I evaluated Scali MPI Connect 5.5 (SMC), SMC 5.6,
HB> HP MPI 2.0.2.2, MVAPICH 0.9.9, MVAPICH2 0.9.8, Open MPI 1.1.1.
HB> Of these, Open MPI was the slowest for all
HB> benchmarks and all machines, upto 10 times slower than SMC 5.6.
You are not gonna share these benchmark results with us, right? Would be very
interesting to see that!
HB> Now since Open MPI 1.1.1 is quite old, I just
HB> redid the message rate measurement on an X5355
HB> (Clovertown, 2.66GHz). On an 8-byte message size,
HB> OpenMPI 1.2.2 achieves 5.5 million messages per
HB> seconds, whereas SMC 5.6.2 reaches 16.9 million
HB> messages per second (using all 8 cores on the node, i.e., 8 MPI processes).
HB> Comparing OpenMPI 1.2.2 with SMC 5.6.1 on
HB> ping-ping latency (usec) on an 8-byte payload yields:
HB> mapping OpenMPI SMC
HB> sdss 0.95 0.18
HB> ddss 1.18 0.12
HB> ddds 1.03 0.12
Impressive. But i never doubted that commercial MPIs are faster.
HB> So, Jan, I would be very curios to see any documentation of your claim
above!
I did a benchmark of a customer application on a 8 node DualSocket DualCore
Opteron cluster - unfortunately i can't remember the name.
I used OpenMPI 1.2 , mpich 1.2.7p1, mvapich 0.97-something and Intel MPI 3.0
IIRC.
I don't have the detailed data available but from my memory:
Latency was worst for mpich (just TCP/IP ;-) ), then IntelMPI, then OpenMPI and
mvapich the fastest.
On a single machine mpich was the worst, then mvapich and then OpenMPI -
IntelMPI was the fastest.
Difference between mvapich and OpenMPI was quite big - Intel just had a small
advantage over OpenMPI.
Since this was not low-level i don't know which communication pattern the
Application used but it seemed to me that the shared memory configuration on
OpenMPI and Intel MPI was far better than on the other two.
Cheers,
Jan
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf