So for the replacement personnel of Greg,
I understand that your cards can't interrupt at all.
Users just have to wait until other messages have past the wire,
before receiving a very short message (that for example aborts the entire job).

In short if some other person on the cluster is streaming a bit at some nodes,
then you have a major latency problem.

Vincent

From : Greg Lindahl
To: Vincent Diepeveen
On Fri, Aug 04, 2006 at 11:19:42AM +0100, Vincent Diepeveen wrote:

What is that time 'in between' for your specific card?

Zero. That's the whole point of the message-rate benchmark, and a
unique aspect of the interconnect that I designed.

Now please stop emailing me personally, you know I find you extremely
annoying.

So the time needed to *interrupt* the current long message.

Our interconnect uses no interrupts.

The cruel reality of trying to scale 100% at a network is that you can't make a special thread to just nonstop check for a MPI message, like all you guys do for your pingpong measurements.

We do not ever use a special thread.

That is the REAL problem.

The real problem is that you do not understand that you don't know
everything.

Now, as I said earlier, never email me personally.

-- greg

----- Original Message ----- From: "Vincent Diepeveen" <[EMAIL PROTECTED]>
To: "Greg Lindahl" <[EMAIL PROTECTED]>
Sent: Friday, August 04, 2006 11:19 AM
Subject: Re: [Beowulf] Correct networking solution for 16-core nodes


Yeah you meant it's 200 usec latency

When 16 cores want all something from the card and that card is serving 16 threads, then 200 usec is probably the minimum latency when for example 1 long MPI message (say of about 200 MB) is arriving, until some other thread receives some very short message "in between".

What is that time 'in between' for your specific card?

So the time needed to *interrupt* the current long message.

The cruel reality of trying to scale 100% at a network is that you can't make a special thread to just nonstop check for a MPI message, like all you guys do for your pingpong measurements.

If you have 16 cores then you want to run 16 processes at 16 cores, a 17th thread is already doing time division, a 18th thread is doing i/o from and to the user. The 19th thread checks for MPI messages from other nodes regurarly, then if it's not in the runqueue, the OS already has a wakeup latency to put a thread in the runqueue of 10 milliseconds.

That is the REAL problem.

You just can't dedicate a special thread to short messages if you want to use all cores thanks to the runqueue latency.

So the only solution for that is polling from the working processes themselves.

For non-embarrassingly parallel software that needs to poll for short messages therefore the time needed
to do a single poll whether a tiny message is there, is very CRUCIAL.

If it's a read from local RAM (local to that processor) taking 0.13 us then that is in fact already slowing down the
program a small tad.

Preferably most of such polls happen from the L2 which is a cycle or 13.

It's quite interesting to know which card/implementation has the fastest poll time here for processes that regurarly poll for short messages,
that includes overhead to check for overflow of the given protocol.

If that's 0.5 us because you have to check for all kind of MPI overflow then that sucks a lot. Such a card i throw away directly.

Vincent

----- Original Message ----- From: "Greg Lindahl" <[EMAIL PROTECTED]>
To: "Joachim Worringen" <[EMAIL PROTECTED]>; <beowulf@beowulf.org>
Sent: Thursday, August 03, 2006 10:07 PM
Subject: Re: [Beowulf] Correct networking solution for 16-core nodes


On Thu, Aug 03, 2006 at 12:53:40PM -0700, Greg Lindahl wrote:

We have clearly stated that the Mellanox switch is around 200 usec per
hop.  Myricom's number is also well known.

Er, 200 micro seconds. Y'all know what I meant, right? :-)

-- greg

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to