On Fri, Oct 03, 2008 at 10:00:05AM -0400, Mark Hahn wrote: > qdr would be 40 Gb (raw data rate, right? so 4 GB/s before any sort > of packet overhead, etc.) I don't really see why that's a problem - > even a memory-constrained current-gen Intel box has about twice that much > memory bandwidth available. AMD or next-gen-Intel will > be even less constrained.
Let's say that I'm sending data which is in a cache. So when the HCA does the DMA operation, all the bytes have to be flushed from cache to main memory, and then transferred from main memory to the HCA. And the system isn't idle while you do this, so these transfers are less efficient than you might think. Of course, your "latency and bandwidth" benchmark won't see this problem, because it only uses a single core, and it sends the same buffer over and over without touching it. -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf