On Wed, 2016-04-06 at 15:26 +0100, Edward Cree wrote: > On 06/04/16 14:53, Tom Herbert wrote: > > But again, this scheme is optimizing for forwarding case and doesn't > > help (and probably hurts) the use case of locally terminated > > connections > Not really. I think this has a chance to outperform GRO for locally > terminated connections, for a number of reasons: > * Doesn't look at higher-layer or inner headers until later in packet > processing, for instance we (maybe) process every L3 header in a NAPI poll > before looking at a single L4. This could delay touching the second > cacheline of some packets. > * Doesn't have to re-write headers to produce a coherent superframe. > Instead, each segment carries its original headers around with it. Also > means we can skip _checking_ some headers to see if we're 'allowed' to > coalesce (e.g. TCP TS differences, and the current fun with IP IDs). > * Can be used for protocols like UDP where the original packet boundaries > are important (so you can't coalesce into a superframe). > Really the last of those was the original reason for this idea, helping with > forwarding is just another nice bonus that we (might) get from it. > And it's all speculative and I don't know for sure what the performance > would be like, but I won't know until I try it!
Look at the mess of some helpers in net/core/skbuff.c, and imagine the super mess it would be if using a concept of 'super packet with various headers on each segment'. netfilter is already complex, it would become a nightmare. GRO on the other hand presents one virtual set of headers, then the payload in one or multiple frags.