On Wed, Apr 26, 2006 at 11:25:01PM -0700, David S. Miller ([EMAIL PROTECTED]) wrote: > > We approached this from the understanding that an intelligent NIC > > will be able to transition directly to userspace, which is a major > > win. 0 copies to userspace would be sweet. I think we can still > > achieve this using your scheme without *too* much pain. > > Understood. What's your basic idea? Just make the buffers in the > pool large enough to fit the SKB encapsulation at the end?
There are some caveats here found while developing zero-copy sniffer [1]. Project's goal was to remap skbs into userspace in real-time. While absolute numbers (posted to netdev@) were really high, it is only applicable to read-only application. As was shown in IOAT thread, data must be warmed in caches, so reading from mapped area will be as fast as memcpy() (read+write), and copy_to_user() actually almost equal to memcpy() (benchmarks were posted to netdev@). And we must add remapping overhead. If we want to dma data from nic into premapped userspace area, this will strike with message sizes/misalignment/slow read and so on, so preallocation has even more problems. This change also requires significant changes in application, at least until recv/send are changed, which is not the best thing to do. So I think that mapping itself can be done as some additional socket option or something not turnedon by default. I do think that significant win in VJ's tests belongs not to remapping and cache-oriented changes, but to move all protocol processing into process' context. I fully agree with Dave that it must be implemented step-by-step, and the most significant, IMHO, is moving protocol processing into socket's "place". This will force to netfilter changes, but I do think that for the proof-of-concept code we can turn it off. I will start to work in this direction next week after aio_sendfile() is completed. So, we will have three attempts to write incompatible stacks - and that is good :) No one need an excuse to rewrite something, as I read in Rusty's blog... Thanks. [1]. http://tservice.net.ru/~s0mbre/old/?section=projects&item=af_tlb -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html