On Wednesday 01 February 2006 22:11, David S. Miller wrote: > From: Andi Kleen <[EMAIL PROTECTED]> > Date: Wed, 1 Feb 2006 19:28:46 +0100 > > > http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf > > I did a writeup in my blog about all of this, another good > reason to actively follow my blog: > > http://vger.kernel.org/~davem/cgi-bin/blog.cgi/index.html > > Go read. > > > -Andi (who prefers sourceware over slideware) > > People are definitely hung up on the details, and that means > they are analyzing Van's work from the absolute _wrong_ angle.
The main reason i look for details is that it's unclear to me if his work is one copy or zero copy and how the actual data in the channels are managed. The netchannels seem to just pass indexes into some other buffer, so unless he found a much better e1000 than I have @) it's probably single copy from RX ring into another big buffer. Right? Some of the other stuff sounded like an attempted zero copy. How is that other buffer managed? Is it sitting in user space? If yes then how does the data end up in the simulated read() in user space? That would require another copy unless I'm missing something. The other way if it's not copy-from-rx-ring to another buffer would be to have a big shared pool of always mapped to everybody pool (assuming no intelligent NIC queue support) - that would be inscure right? I guess independent of any other stuff it would be an interesting experiment to change the socket and TCP prequeue into an linked list of arrays pointing to skb and see if it really helps over the double linked lists (that are the points that should pass skbs between CPUs) Also the TX part is a bit unclear. > So when a TCP socket enters established state, we add an entry into > the classifier. The classifier is even smart enough to look for > a listening socket if the fully established classification fails. I think it's a pretty important detail. The current TCP demultiplex is a considerable part of the TCP processing cost and i haven't see any good proposals yet to make it faster [except the old one of using a smaller hash ..] Is he using some kind of binary tree for this or a hash? > Van is not against NAPI, in fact he's taking NAPI to the next level. > Softirq handling is overhead, and as this work shows, it is totally > unnecessary overhead. > > Yes we do TCP prequeue now, and that's where the second stage net > channel stuff hooks into. But prequeue as we have it now is not > enough, we still run softirq, and IP input processing from softirq not > from user socket context. I don't quite get why this is a problem. softirq is on the same CPU as the interrupt so it should be pretty cheap (no bounced cachelines) Due to the way the stacking works the cache locality should be also ok (except for the big hash tables) > The RX net channel bypasses all of that > crap. > > The way we do softirq now we can feed one cpu with softirq work given > a single card, with Van's stuff we can feed socket users on multiple > cpus with a single card. The net channel data structure SMP > friendliness really helps here. Ok so the point is to not keep the softirq work on the CPU which has the interrupt affinity. MSI-X & receive hashing should solve that one mostly anyways, no? But I agree it would be nice to fix it on old hardware too. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html