On Tue, Sep 3, 2019 at 11:52 AM Shmulik Ladkani <shmu...@metanetworks.com> wrote: > > On Sun, 1 Sep 2019 16:05:48 -0400 > Willem de Bruijn <willemdebruijn.ker...@gmail.com> wrote: > > > One quick fix is to disable sg and thus revert to copying in this > > case. Not ideal, but better than a kernel splat: > > > > @@ -3714,6 +3714,9 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > > sg = !!(features & NETIF_F_SG); > > csum = !!can_checksum_protocol(features, proto); > > > > + if (list_skb && skb_headlen(list_skb) && !list_skb->head_frag) > > + sg = false; > > + > > Thanks Willem. > > I followed this approach, and further refined it based on the conditions > that lead to this BUG_ON: > > - existance of frag_list > - mangled gso_size (using SKB_GSO_DODGY as a hint) > - some frag in the frag_list has a linear part that is NOT head_frag, > or length not equal to the requested gso_size > > BTW, doing so allowed me to refactor a loop that tests for similar > conditions in the !(features & NETIF_F_GSO_PARTIAL) case, where an > attempt to execute partial splitting at the frag_list pointer (see > 07b26c9454a2 and 43170c4e0ba7). > > I've tested this using the reproducer, with various linear skbs in > the frag_list and different gso_size mangling. All resulting 'segs' > looked correct. Did not test on a live system yet. > > Comments are welcome. > > specifically, I would like to know whether we can > - better refine the condition where this "sg=false fallback" needs > to be applied > - consolidate my new 'list_skb && (type & SKB_GSO_DODGY)' case with > the existing '!(features & NETIF_F_GSO_PARTIAL)' case
This is a lot more code change. Especially for stable fixes that need to be backported, a smaller patch is preferable. My suggestion only tested the first frag_skb length. If a list can be created where the first frag_skb is head_frag but a later one is not, it will fail short. I kind of doubt that. By default skb_gro_receive builds GSO skbs that can be segmented along the original gso_size boundaries. We have so far only observed this issue when messing with gso_size. We can easily refine the test to fall back on to copying only if skb_headlen(list_skb) != mss. Alternatively, only on SKB_GSO_DODGY is fine, too. I suggest we stick with the two-liner.