Re: [Beowulf] Three notes from ISC 2006

Joachim Worringen Wed, 28 Jun 2006 11:01:51 -0700

Patrick Geoffray wrote:

Greg Lindahl wrote:
On Wed, Jun 28, 2006 at 07:28:53AM -0400, Patrick Geoffray wrote:
I have keep it quiet even when you where saying things driven by
marketing rather than technical considerations (the packet per
second nonsense),
Patrick, that "packet per second nonsense" is the technical reason our
interconnect does so well. If you'd like to argue about it,
technically, I'd be happy to do so. No need to keep quiet.
My reservation was about the way you present it, not the technical ideabehind. Actually, my real concern was that there was no technicalcontent in your post, just references to white papers, ie marketing fluff.

An offer for "getting a secret white paper on request" is marketing, you areright. But at least the SPEC number was technical content - and we don't want toanalyse every posting sentence-by-sentence, do we?

So, let's finally talk about the technical part. You claim that the keymetric in your product is the messaging rate, ie the number of packetsyou can send per second. You even have a fancy name for it, somethinglike Hyper Duper Messaging :-)

[...]

Let me summarize what I consider the key issues:

- explicit MPI_Irecv/MPI_Send/MPI_Wait, or similar patterns implicitely inMPI_Reduce/MPI_Alltoall/MPI_Allreduce with small messages (a few doubles, or afew kB) are the dominant communication pattern in many MPI applications. Thereare quite some (but not as many as one could wish) studies that show this.- This means it's generally a good thing if the "ping" latency (duration ofMPI_Send in number of CPU cycles) is as low as possible.- At this message size, CPU utilization or overlapping computing andcommunication is not relevant, as (zero-copy) RDMA does not pay off until themessage gets at least some (typically >32, or more) kB in size, due to theimplied pinning and rendez-vous overhead. Also, MPI_Send has no opportunity foroverlap, and having a progress thread on the receive CPU steal cycles from theapplication doesn't really help, neither.- In these cases, all(?) interconnects do some sort of memcpy() within MPI_Sendto get rid of the data. The differences are* How long does it take to prepare things for the memcpy()? This is Greg'smessage rate.

 * When does the data arrive at the destination?

- But you never want to send millions of messages at once. This ismicro-benchmarking at its best. It gives some indications, but seen alone, it isno prove for anything.- *If* you feel you need to use such a new metric for whatever reason, youshould at least publish the benchmark that is used to gather these numbers toallow others to do comparative measurements. This goes to Greg.

But I don't think that Greg's "Real Appliation Performance" white paper isinfamous. It states where the data comes from, you have to trust him for hisown numbers, and it does not directly link the differences in the applicationperformance to the messaging rate. Of course, it does not offer a scientificanalysis, and you can not compare it to papers like the ones from Leonid Oliker.But I don't think it's unfair, and surely stimulates the competition for bettertechnical solutions or better white papers.


--
Joachim - reply to joachim at domain ccrl-nece dot de

Opinion expressed is personal and does not constitute
an opinion or statement of NEC Laboratories.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Three notes from ISC 2006

Reply via email to