Hi, On Mon, 22 Aug 2016 14:58:42 +0200, f...@strlen.de wrote: > Shmulik Ladkani <shmulik.ladk...@gmail.com> wrote: > > There are cases where gso skbs (which originate from an ingress > > interface) have a gso_size value that exceeds the output dst mtu: > > > > - ipv4 forwarding middlebox having in/out interfaces with different mtus > > addressed by fe6cc55f3a 'net: ip, ipv6: handle gso skbs in forwarding > > path' > > - bridge having a tunnel member interface stacked over a device with small > > mtu > > addressed by b8247f095e 'net: ip_finish_output_gso: If > > skb_gso_network_seglen exceeds MTU, allow segmentation for local udp > > tunneled skbs' > > > > In both cases, such skbs are identified, then go through early software > > segmentation+fragmentation as part of ip_finish_output_gso. > > > > Another approach is to shrink the gso_size to a value suitable so > > resulting segments are smaller than dst mtu, as suggeted by Eric > > Dumazet (as part of [1]) and Florian Westphal (as part of [2]). > > > > This will void the need for software segmentation/fragmentation at > > ip_finish_output_gso, thus significantly improve throughput and lower > > cpu load. > > > > This RFC patch attempts to implement this gso_size clamping. > > > > [1] https://patchwork.ozlabs.org/patch/314327/ > > [2] https://patchwork.ozlabs.org/patch/644724/ > > > > Signed-off-by: Shmulik Ladkani <shmulik.ladk...@gmail.com> > > --- > > > > Florian, in fe6cc55f you described a BUG due to gso_size decrease. > > I've tested both bridged and routed cases, but in my setups failed to > > hit the issue; Appreciate if you can provide some hints. > > Still get the BUG, I applied this patch on top of net-next.
The BUG occurs when GRO occurs on the ingress, and only if GRO merges skbs into the frag_list (OTOH when segments are only placed into frags[] of a single skb, skb_segment succeeds even if gso_size was altered). This is due to an assumption that the frag_list members terminate on exact MSS boundaries (which no longer holds during gso_size clamping). The assumption is dated back to original support of 'frag_list' to skb_segment: 89319d3801 net: Add frag_list support to skb_segment This patch adds limited support for handling frag_list packets in skb_segment. The intention is to support GRO (Generic Receive Offload) packets which will be constructed by chaining normal packets using frag_list. As such we require all frag_list members terminate on exact MSS boundaries. This is checked using BUG_ON. As there should only be one producer in the kernel of such packets, namely GRO, this requirement should not be difficult to maintain. We have few alternatives for gso_size clamping: 1 Fix 'skb_segment' arithmentics to support inputs that do not match the "frag_list members terminate on exact MSS" assumption. 2 Perform gso_size clamping in 'ip_finish_output_gso' for non-GROed skbs. Other usecases will still benefit: (a) packets arriving from virtualized interfaces, e.g. tap and friends; (b) packets arriving from veth of other namespaces (packets are locally generated by TCP stack on a different netns). 3 Ditch the idea, again ;) We can persue (1), especially if there are other benefits doing so. OTOH due to the current complexity of 'skb_segment' this is bit risky. Going with (2) could be reasonable too, as it brings value for the virtualized environmnets that need gso_size clamping, while presenting minimal risk. Appreciate your opinions. Regards, Shmulik