At 17:55 24.04.2007, Ashley Pittman wrote:
That would explain why qlogic use PIO for up to 64k messages and we
switch to DMA at only a few hundred.  For small messages you could best
describe what we use as a hybrid of the above descriptions, we write the
a network packet across the PCI bus and don't DMA at all.

I assume QsNet has to do something with the packet after it has been written to the HCA. Since the outbound PCI address space is only 32-bits (who needs more than 4GigB of CSR, other than cluster people attempting to map all the accumulated memory of the nodes in the cluster into a single address space?), I assume QsNet uses part of the packet as 64-bit address information and starts a DMA from the HCA local buffer to the remove destination.

The downside to PIO of course is you need a CPU to drive it so besides
the fact it's slow you can't make do anything asynchronously.

This is a classic tradeoff. Most applications _create_ the message before it is sent (contrary to many p2p benchmarks). Hence, it resides in the L1 or L2 cache of the CPU with a (MOESI) Modified state. It is the very efficient to use the CPU to read its local cache and write the message using the WC buffer. Contrary, the HCA has to issue a DMA read to memory, the CPU cache(s) is snooped, data is transferred to the memory _and_ to the HCA. The cache state ends up in Shared state, and a bus transaction is required in order to make it Modified again (when the buffer is written the next time).

That's an interesting theory, but I suspect your numbers are a little
out.  My own measurements put a PIO word write in the region of .15 uSec
depending on chipset.  Of course if you are right then the remaining PIO
write is happening in 1 uSec which leaves only .2uSec for the network
which seems a little fast to me.

Just to make sure we compare the same thing; the .15usec is the time from the CPU issuing the store instruction until the side effect is visible in the HCA? In other words, assume a CSR word read takes 0.5usec, a loop writing and reading the same CSR take 0.65usec, right? If that the case, CSR accesses have improved radically the last years.



Håkon





_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to