Hi Patrick,
Interesting to know that you nowadays market ethernet cards. Still
some knowledge on other companies switches
you also seem to posses. Congrats.
My faith in the switch and crossbars is actually quite high. Not so
much in the MPI-cards however.
Let's assume for now that I was speaking of highend network cards
that receive and send MPI packets in our myri cluster.
Let's say we got a cool quad xeon MP node @ 64 logical cores @ 3.x
Ghz. Soon a very popular machine i'd guess.
Not really any other x86 cpu that can take it on head to head.
The upcoming release of the 'beckton' cpu from intel. The monster
that probably
eats more power than any intel x86 cpu before :)
Feeding the monster node is not so dead easy.
Just look at 1 node now please. You've got at the MPI card 1 big long
packet of a few megabyte that gets received.
Idemdito a few other threads get a few megabyte sized packets. At
same time of all this, the card
also gets a packet of a few bytes for thread 42.
What is the time now for thread 42 to get the packet?
Will it first handle all the megabyte sized packets, or give the
quick short packet already 'in between' to our "logical core 42"?
Thanks,
Vincent
p.s. i expect of course the answer '42' :)
On Feb 12, 2009, at 2:30 AM, Patrick Geoffray wrote:
Vincent Diepeveen wrote:
All such sorts of switch latencies are at least factor 50-100
worse than their one-way pingpong latency.
I think you are a bit confused about switch latencies.
There is the crossbar latency that is the time it takes for a
packet to be decoded and routed to the right output port. It is
essentially the difference between the pingpong latency with and
without the crossbar in the middle for the smallest packet size.
Typical crossbar latencies are in the order of 100ns for recent
Ethernot, 200ns for Ethernet. To build bigger fabric, you need to
connect multiple crossbars into Clos, Fat-tree or Torus topologies.
The end-to-end switch latency is then dependent on the number of
crossbars the packet crosses.
There is the PHY/transceiver latency. That only applies to the edge
of the switch, where a physical cable plugs into a sockets. SFP+
for example requires serialization compared to QSFP. With fiber,
the transceiver have some overhead. Typical overhead is 250ns per
port for serial fiber PHY, almost nothing for parallel copper.
Another overhead is the head-of-Line blocking. It happens when the
packet has to wait for another one to pass in order to be switched.
This is equivalent to 2 cars turning on the same road: one will
have to wait on the other to make the turn. This latency can be
high, specially if the packets are large (imagine a couple of
trains instead of cars).
Is that what you call "ugly switch latency" ? HOL blocking will
reduce you switch efficiency to ~40% with random traffic. That
means your latency will be about two times higher in average,
assuming all packets have the same size. Where is the factor 50-100 ?
My assumption is always: "if manufacturer doesn't respond it must
be real bad for his network card".
Maybe they don't respond because the question does not make any sense.
Note that pingpong latency also gets demonstrated in a wrong manner.
Requirement to determine one way pingpong should be that it eats
no cpu time obtaining it.
You mean blocking on an interrupt ? When you go to a restaurant, do
you place your order and go back home waiting for a phone call or
do you wait at a table ? I, for one, sit down and busy poll.
Patrick
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf