On 03/16/2018 03:37 PM, Yonghong Song wrote: > > Eric and Daniel, > > I have tried to fix this issue but not really successful. > I tried two hacks: > . if skb_headlen(list_skb) is not 0, we just pull > skb_headlen(list_skb) from the skb to make skb_headlen(list_skb) = 0, or > . if skb_headlen(list_skb) is not 0, we go to the beginning of > the outer loop which will allocate another nskb for this list_skb. > > Both approaches removed the BUG and the packet is able to reach the > remote host. Upon receiving the packet, however, the remote host sends a > reset packet back so connection eventually closed. I did not debug > further on this. > > Considering it is tricky to change skb_segment, I hacked test_bpf > kernel module to reproduce the issue. The change reflects the gso packet > structure I got from mlx5. Maybe you could take a look and suggest a fix or a > direction of how to move forward. > > Thanks! > > ============= PATCH =============== > > -bash-4.2$ git show > commit 41681ab51f85b4a0ea3416a0a62d6bde74f3af4b > Author: Yonghong Song <y...@fb.com> > Date: Fri Mar 16 15:10:02 2018 -0700 > > [hack] hack test_bpf module to trigger BUG_ON in skb_segment. > > "modprobe test_bpf" will have the following errors: > ... > [ 98.149165] ------------[ cut here ]------------ > [ 98.159362] kernel BUG at net/core/skbuff.c:3667! > [ 98.169756] invalid opcode: 0000 [#1] SMP PTI > [ 98.179370] Modules linked in: > [ 98.179371] test_bpf(+) > ... > > The BUG happens in function skb_segment: > ... > 3665 while (pos < offset + len) { > 3666 if (i >= nfrags) { > 3667 BUG_ON(skb_headlen(list_skb));
Note that you do not need to pull data. Chances are high that skb->head is a page fragment, so can be considered as such ;) Look at users of skb_head_is_locked() > 3668 > 3669 i = 0; > 3670 nfrags = > skb_shinfo(list_skb)->nr_frags; > 3671 frag = skb_shinfo(list_skb)->frags; > 3672 frag_skb = list_skb; > 3673 > 3674 BUG_ON(!nfrags); > ... > > The skbs are constructed to mimic what mlx5 may generate. > The packet size/header may not mimic real cases in production. But > the processing flow is similar. > > Signed-off-by: Yonghong Song <y...@fb.com> > > diff --git a/lib/test_bpf.c b/lib/test_bpf.c > index 2efb213..d36a991 100644 > --- a/lib/test_bpf.c > +++ b/lib/test_bpf.c > @@ -6574,6 +6574,67 @@ static bool exclude_test(int test_id) > return test_id < test_range[0] || test_id > test_range[1]; > } > > +static struct sk_buff *build_test_skb(void) { > + u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > + u32 headroom = NET_SKB_PAD + NET_IP_ALIGN + ETH_HLEN; > + struct sk_buff *skb[2]; > + void *data[2], *page; > + int i, data_size = 8; > + > + page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO); > + if (!page) > + return NULL; > + > + for (i = 0; i < 2; i++) { > + data[i] = kzalloc(headroom + tailroom + data_size, > GFP_KERNEL); > + if (!data[i]) > + return NULL; > + skb[i] = build_skb(data[i], 0); > + if (!skb[i]) { > + kfree(data[i]); > + return NULL; > + } > + skb_reserve(skb[i], headroom); > + skb_put(skb[i], data_size); > + skb[i]->protocol = htons(ETH_P_IP); > + skb_reset_network_header(skb[i]); > + skb_set_mac_header(skb[i], -ETH_HLEN); > + > + skb_add_rx_frag(skb[i], skb_shinfo(skb[i])->nr_frags, > + page, 0, 64, 64); > + } > + > + /* setup shinfo */ > + skb_shinfo(skb[0])->gso_size = 1448; > + skb_shinfo(skb[0])->gso_type = SKB_GSO_TCPV4; > + skb_shinfo(skb[0])->gso_type |= SKB_GSO_DODGY; > + skb_shinfo(skb[0])->gso_segs = 0; > + skb_shinfo(skb[0])->frag_list = skb[1]; > + > + /* adjust skb[0]'s len */ > + skb[0]->len += skb[1]->len; > + skb[0]->data_len += skb[1]->data_len; > + skb[0]->truesize += skb[1]->truesize; > + > + return skb[0]; > +} > + > +static void test_skb_segment(void) { > + netdev_features_t features; > + struct sk_buff *skb; > + > + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | > NETIF_F_IPV6_CSUM; > + features |= NETIF_F_RXCSUM; > + skb = build_test_skb(); > + if (!skb) > + pr_info("Failed in test_skb_segment:build_test_skb!"); > + > + if (skb_segment(skb, features)) > + pr_info("Success in test_skb_segment!"); > + else > + pr_info("Failed in test_skb_segment!"); > +} > + > static __init int test_bpf(void) > { > int i, err_cnt = 0, pass_cnt = 0; > @@ -6631,7 +6692,8 @@ static int __init test_bpf_init(void) > if (ret < 0) > return ret; > > - ret = test_bpf(); > + // ret = test_bpf(); > + test_skb_segment(); > > destroy_bpf_tests(); > return ret; > > > ============= END ================ > > > On 3/13/18 6:15 PM, Eric Dumazet wrote: >> >> >> On 03/13/2018 05:35 PM, Eric Dumazet wrote: >>> >>> >>> On 03/13/2018 05:26 PM, Eric Dumazet wrote: >>>> >>>> >>>> On 03/13/2018 05:04 PM, Alexei Starovoitov wrote: >>>>> On 3/13/18 4:27 PM, Eric Dumazet wrote: >>>>>> >>>>>> >>>>>> On 03/13/2018 04:09 PM, Alexei Starovoitov wrote: >>>>>> >>>>>>> we have bpf_skb_proto_6_to_4() that was used by cilium for long time. >>>>>>> It's not clear why it's not crashing there, but we cannot just >>>>>>> reject changing proto in bpf programs now. >>>>>>> We have to fix whatever needs to be fixed in skb_segment >>>>>>> (if bug is there) or fix whatever necessary on mlx5 side. >>>>>>> In bpf helper we mark it as SKB_GSO_DODGY just like packets coming >>>>>>> through virtio would do, so if skb_segment() needs to do something >>>>>>> special with skb the SKB_GSO_DODGY flag is already there. >>>>>> >>>>>> 'Fixing' skb_segment(), I did that a long time ago and Herbert Xu was >>>>>> not happy with the fix and provided something else. >>>>> >>>>> any link to your old patches and discussion? >>>>> >>>>> I think since mlx4 can do tso on them and the packets came out >>>>> correct on the wire, there is nothing fundamentally wrong with >>>>> changing gso_size. Just tricky to teach skb_segment. >>>>> >>>> >>>> The world is not mlx4 only. Some NIC will ask skb_segment() fallback >>>> segmentation for various reasons (like skb->len above a given limit like >>>> 16KB) >>>> >>>> Maybe >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg255549.html&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=x9lG2k53B7AFsnNx6t2DgbXt06J-sLznZIHk6YdGoGg&s=0Nxx2G8PtDAEMCFAvQ7kxYTXVr9aHdOolP1KB_lnmes&e= >>> >>> >>> Herbert patch : >>> >>> commit 9d8506cc2d7ea1f911c72c100193a3677f6668c3 >>> Author: Herbert Xu <herb...@gondor.apana.org.au> >>> Date: Thu Nov 21 11:10:04 2013 -0800 >>> >>> gso: handle new frag_list of frags GRO packets >>> >> >> I found my initial patch. >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg255452.html&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=x9lG2k53B7AFsnNx6t2DgbXt06J-sLznZIHk6YdGoGg&s=VuWRpUdJwBwTxpnMNZYgKvQANLL5UA7hZnTFZsQlK6c&e= >> >>