The original question was about relatively small messages - only 500 bytes each
You can often get better throughput if you send say two smaller messages rather than one large one. This is since the interconnect can generate multiple RDMA requests that can proceed concurrently. This old paper from 2003 illustrates this http://www.docstoc.com/docs/5579957/Quadrics-QsNetII-A-network-for-Supercomputing-Applications Page 25 shows a graph where 1,2,4 and 8 concurrent RDMA are issued concurrently. For large messages (>256KB) there is no significant difference in the achieved total bandwidth - it is limited by the PCIe/PCI-X interface or the interconnect fabric itself. But at smaller messages sizes there are measurable differences - eg. two 1K messages show higher total bandwidth than a single 2K message. Daniel p.s. did you really mean to compare three 500bytes transfers with a single 2000byte transfer, rather than the same total message size in both cases? pps. Case A is really a broadcast - interconnects that implement broadcast in hardware are bound to do A faster than B From: beowulf-boun...@beowulf.org [mailto:beowulf-boun...@beowulf.org] On Behalf Of Bruno Coutinho Sent: 23 May 2009 16:44 To: tri...@vision.ee.ethz.ch Cc: beowulf@beowulf.org Subject: Re: [Beowulf] MPI - time for packing, unpacking, creating a message... If you are using Gigabit Ethernet with jumbo frames (9000 bytes for example): A will send 3 packets with 4000 bytes and B will send one of 9000 bytes and one of 7000 bytes. For the cpu B is better, because will generate one system call and A will generate three and as many high speed interconnects today need large packets to fully utilize their bandwidth, I think that B should be faster. But the only way to be sure is testing. 2009/5/18 <tri...@vision.ee.ethz.ch<mailto:tri...@vision.ee.ethz.ch>> Hi all, is there anyone who can tell me if A) or B) is probably faster? A) process 0 sends 3x500 elements, e.g. doubles, to 3 different processors using something like if(rank==0){ MPI_Send(sendbuf, 500, MPI_DOUBLE, 1, 1, MPI_COMM_WORLD); MPI_Send(sendbuf, 500, MPI_DOUBLE, 2, 2, MPI_COMM_WORLD); MPI_Send(sendbuf, 500, MPI_DOUBLE, 3, 3, MPI_COMM_WORLD); } else MPI_Recv(recvbuf, 500, MPI_DOUBLE, 0, rank, MPI_COMM_WORLD, status); B) process 0 sends 2000 elements to process 1 using if(rank==0) MPI_Send(sendbuf, 2000, MPI_DOUBLE, 1, 1, MPI_COMM_WORLD); else MPI_Recv(recvbuf, 2000, MPI_DOUBLE, 0, rank, MPI_COMM_WORLD, status); _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf