Stuart Midgley <[EMAIL PROTECTED]> writes: >> >> It does apply, however, many parallel algorithms used today are >> naturally blocking. Why? Well, complicating your algorithm to overlap >> communication and computation rarely gives a benefit in practice. So >> anyone who's tried has likely become discouraged, and most people >> haven't even tried. >> >> -- greg > > > You comment about overlapping computation and communication is interesting. > As > the number of cores per address space goes up, the chance that overlapping > computation with communication actually gives you anything also > decreases... memory copies require CPU intervention (unless you offload it to > your NIC which then means you suffer the normal latencies/message rates etc > there). > > Sure, you can offload the copy to the NIC on some interconnects (eg. > Quadrics) > but I personally found that the increased latency and decreased bandwidth of > the copy affected performance more than not overlapping.
But overlapping compute and I/O while nice isn't the point. The point is to have a buffer so your processes don't have to be in rigid lockstep. Letting you bury OS jitter and communication latency, because you have some work to do. If it all happens on one machine that is fine. By receiving asynchronously you can receive the data when you are ready for it, so none of your processes needs to block waiting for the other, so ideally you are always busy doing useful work. So no I don't think it is a waste of time when you have lots of cores per node. There is less latency to bury, and lower odds of getting processes out of lockstep, so the win is less until you go off node but that is about it. What I don't have a solid grasp on are what the data models of current applications look like so I don't know how hard it is to be able to have several messages in flight at any one time, and it may be in that case there is little difference between a synchronous receive and an asynchronous one. As the synchronous receive can just get the data out of the buffer and give it the application, so you would only block if there were no more messages but usually there would be a message waiting so that would work. The only place I know where asynchronous message reception makes a lot of sense is when you have several channels that could arrive at the same time and then you could arrange for them to be processes in any order. I need to go back and look and see what the MPI primitives for that kind of operation are. Eric _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf