Hallo Håkon,

Freitag, 25. April 2008, meintest Du:

HB> Hi Jan,

HB> At Wed, 23 Apr 2008 20:37:06 +0200, Jan Heichler <[EMAIL PROTECTED]> wrote:
>> >From what i saw OpenMPI has several advantages:

>>- better performance on MultiCore Systems 
>>because of good shared-memory-implementation


HB> A couple of months ago, I conducted a thorough 
HB> study on intra-node performance of different MPIs 
HB> on Intel Woodcrest and Clovertown systems. I 
HB> systematically tested pnt-to-pnt performance 
HB> between processes on a) the same die on the same 
HB> socket (sdss), b) different dies on same socket 
HB> (ddss) (not on Woodcrest of course) and c) 
HB> different dies on different sockets (ddds). I 
HB> also measured the message rate using all 4 / 8 
HB> cores on the node. The pnt-to-pnt benchmarks used 
HB> was ping-ping, ping-pong (Scali?s `bandwidth´ and 
osu_latency+osu_bandwidth).

HB> I evaluated Scali MPI Connect 5.5 (SMC), SMC 5.6, 
HB> HP MPI 2.0.2.2, MVAPICH 0.9.9, MVAPICH2 0.9.8, Open MPI 1.1.1.

HB> Of these, Open MPI was the slowest for all 
HB> benchmarks and all machines, upto 10 times slower than SMC 5.6.


You are not gonna share these benchmark results with us, right? Would be very 
interesting to see that!

HB> Now since Open MPI 1.1.1 is quite old, I just 
HB> redid the message rate measurement on an X5355 
HB> (Clovertown, 2.66GHz). On an 8-byte message size, 
HB> OpenMPI 1.2.2 achieves 5.5 million messages per 
HB> seconds, whereas SMC 5.6.2 reaches 16.9 million 
HB> messages per second (using all 8 cores on the node, i.e., 8 MPI processes).

HB> Comparing OpenMPI 1.2.2 with SMC 5.6.1 on 
HB> ping-ping latency (usec) on an 8-byte payload yields:

HB> mapping OpenMPI   SMC
HB> sdss       0.95  0.18
HB> ddss       1.18  0.12
HB> ddds       1.03  0.12

Impressive. But i never doubted that commercial MPIs are faster. 

HB> So, Jan, I would be very curios to see any documentation of your claim 
above!

I did a benchmark of a customer application on a 8 node DualSocket DualCore 
Opteron cluster - unfortunately i can't remember the name. 

I used OpenMPI 1.2 , mpich 1.2.7p1, mvapich 0.97-something and Intel MPI 3.0 
IIRC.

I don't have the detailed data available but from my memory:

Latency was worst for mpich (just TCP/IP ;-) ), then IntelMPI, then OpenMPI and 
mvapich the fastest. 
On a single machine mpich was the worst, then mvapich and then OpenMPI - 
IntelMPI was the fastest. 

Difference between mvapich and OpenMPI was quite big - Intel just had a small 
advantage over OpenMPI. 


Since this was not low-level i don't know which communication pattern the 
Application used but it seemed to me that the shared memory configuration on 
OpenMPI and Intel MPI was far better than on the other two. 

Cheers,
Jan
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to