On Thu, 2016-04-21 at 09:40 +0200, Steffen Klassert wrote: > This partly reverts the below mentioned patch because on > forwarding, such skbs can't be offloaded to a NIC. > > We need this to get IPsec GRO for forwarding to work properly, > otherwise the GRO aggregated packets get segmented again by > the GSO layer. Although discovered when implementing IPsec GRO, > this is a general problem in the forwarding path. > > ------------------------------------------------------------------------- > commit 8a29111c7ca68d928dfab58636f3f6acf0ac04f7 > Author: Eric Dumazet <eduma...@google.com> > Date: Tue Oct 8 09:02:23 2013 -0700 > > net: gro: allow to build full sized skb > > skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb, > typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags. > > It's relatively easy to extend the skb using frag_list to allow > more frags to be appended into the last sk_buff. > > This still builds very efficient skbs, and allows reaching 45 MSS per > skb. > > (45 MSS GRO packet uses one skb plus a frag_list containing 2 additional > sk_buff) > > High speed TCP flows benefit from this extension by lowering TCP stack > cpu usage (less packets stored in receive queue, less ACK packets > processed) > > Forwarding setups could be hurt, as such skbs will need to be > linearized, although its not a new problem, as GRO could already > provide skbs with a frag_list. > > We could make the 65536 bytes threshold a tunable to mitigate this. > > (First time we need to linearize skb in skb_needs_linearize(), we could > lower the tunable to ~16*1460 so that following skb_gro_receive() calls > build smaller skbs) > > Signed-off-by: Eric Dumazet <eduma...@google.com> > Signed-off-by: David S. Miller <da...@davemloft.net> > --------------------------------------------------------------------------- > > Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> > --- > > Hi Eric, this is a followup on our discussion at the netdev > conference. Would you still be ok with this revert, or do > you think there is a better solution in sight?
Note that some GRO enabled drivers would still generate frag_list. (This happens if they are using skb with some TCP payload in skb->head and skb->head was allocated with kmalloc()) We have sysctl_max_skb_frags sysctl, we might have a sysctl enabling/disabling GRO from building any frag_list. Or simply reuse an existing one, like /proc/sys/net/ipv4/ip_forward ?) Here at Google, we increased MAX_SKB_FRAGS, but this is a rather intrusive change to be upstreamed :(