I guess I figured that the data is relatively small compared to the
bandwidth,

I agree, in principle.  and relatively small compared to the amount of ram
in the switch as well.

whereas the latency for ethernet is relatively high.  I also

not _that_ high, though.  with a little tuning (coalesce parameters),
I think 30-40 us half-rtt is pretty common, even over a normal tcp stack. yes, that's 2+ 1.5k packets, but it not _that_ much compared to 1M images.

To make sure there was not an issue with the MPI broadcast, I did one test
run with 5 nodes only sending back 4 bytes of data each.  The result was
a
RTT of less than 0.3 ms.

isn't that kind of high?  a single ping-pong latency should be ~50 us -
maybe I'm underestimating the latency of the broadcast itself.


This is quite a bit more than a single ping-pong. The viewer sends to the
master node (rank 0), and then the master node broadcasts to all other
nodes, and then all nodes send back to the viewer node.  I don't know if
this is still seems high?

the first message should take <50 us. the broadcast to 5 nodes should take 2-3 more 50 us times. so at about 200 us, all the slaves will start
the DOS attack on the viewer node's nic...

But the bcast is always just sending 4 bytes (a single integer), and as

no, afaik no mpi implementations actually utilize the eth-level bcast,
but rather implement bcast as a tree of (uni) sends.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to