Eric and Daniel,

I have tried to fix this issue but not really successful.
I tried two hacks:
  . if skb_headlen(list_skb) is not 0, we just pull
skb_headlen(list_skb) from the skb to make skb_headlen(list_skb) = 0, or
  . if skb_headlen(list_skb) is not 0, we go to the beginning of
    the outer loop which will allocate another nskb for this list_skb.

Both approaches removed the BUG and the packet is able to reach the
remote host. Upon receiving the packet, however, the remote host sends a
reset packet back so connection eventually closed. I did not debug
further on this.

Considering it is tricky to change skb_segment, I hacked test_bpf
kernel module to reproduce the issue. The change reflects the gso packet
structure I got from mlx5. Maybe you could take a look and suggest a fix or a direction of how to move forward.

Thanks!

============= PATCH  ===============

-bash-4.2$ git show
commit 41681ab51f85b4a0ea3416a0a62d6bde74f3af4b
Author: Yonghong Song <y...@fb.com>
Date:   Fri Mar 16 15:10:02 2018 -0700

    [hack] hack test_bpf module to trigger BUG_ON in skb_segment.

    "modprobe test_bpf" will have the following errors:
    ...
    [   98.149165] ------------[ cut here ]------------
    [   98.159362] kernel BUG at net/core/skbuff.c:3667!
    [   98.169756] invalid opcode: 0000 [#1] SMP PTI
    [   98.179370] Modules linked in:
    [   98.179371]  test_bpf(+)
    ...

    The BUG happens in function skb_segment:
    ...
    3665                 while (pos < offset + len) {
    3666                         if (i >= nfrags) {
    3667                                 BUG_ON(skb_headlen(list_skb));
    3668
    3669                                 i = 0;
3670 nfrags = skb_shinfo(list_skb)->nr_frags; 3671 frag = skb_shinfo(list_skb)->frags;
    3672                                 frag_skb = list_skb;
    3673
    3674                                 BUG_ON(!nfrags);
    ...

    The skbs are constructed to mimic what mlx5 may generate.
    The packet size/header may not mimic real cases in production. But
    the processing flow is similar.

    Signed-off-by: Yonghong Song <y...@fb.com>

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 2efb213..d36a991 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -6574,6 +6574,67 @@ static bool exclude_test(int test_id)
        return test_id < test_range[0] || test_id > test_range[1];
 }

+static struct sk_buff *build_test_skb(void) {
+       u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+       u32 headroom = NET_SKB_PAD + NET_IP_ALIGN + ETH_HLEN;
+       struct sk_buff *skb[2];
+       void *data[2], *page;
+       int i, data_size = 8;
+
+       page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+       if (!page)
+               return NULL;
+
+       for (i = 0; i < 2; i++) {
+ data[i] = kzalloc(headroom + tailroom + data_size, GFP_KERNEL);
+               if (!data[i])
+                       return NULL;
+               skb[i] = build_skb(data[i], 0);
+               if (!skb[i]) {
+                       kfree(data[i]);
+                       return NULL;
+               }
+               skb_reserve(skb[i], headroom);
+               skb_put(skb[i], data_size);
+               skb[i]->protocol = htons(ETH_P_IP);
+               skb_reset_network_header(skb[i]);
+               skb_set_mac_header(skb[i], -ETH_HLEN);
+
+               skb_add_rx_frag(skb[i], skb_shinfo(skb[i])->nr_frags,
+                        page, 0, 64, 64);
+       }
+
+       /* setup shinfo */
+       skb_shinfo(skb[0])->gso_size = 1448;
+       skb_shinfo(skb[0])->gso_type = SKB_GSO_TCPV4;
+       skb_shinfo(skb[0])->gso_type |= SKB_GSO_DODGY;
+       skb_shinfo(skb[0])->gso_segs = 0;
+       skb_shinfo(skb[0])->frag_list = skb[1];
+
+       /* adjust skb[0]'s len */
+       skb[0]->len += skb[1]->len;
+       skb[0]->data_len += skb[1]->data_len;
+       skb[0]->truesize += skb[1]->truesize;
+
+       return skb[0];
+}
+
+static void test_skb_segment(void) {
+       netdev_features_t features;
+       struct sk_buff *skb;
+
+ features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
+       features |= NETIF_F_RXCSUM;
+       skb = build_test_skb();
+       if (!skb)
+               pr_info("Failed in test_skb_segment:build_test_skb!");
+
+       if (skb_segment(skb, features))
+               pr_info("Success in test_skb_segment!");
+       else
+               pr_info("Failed in test_skb_segment!");
+}
+
 static __init int test_bpf(void)
 {
        int i, err_cnt = 0, pass_cnt = 0;
@@ -6631,7 +6692,8 @@ static int __init test_bpf_init(void)
        if (ret < 0)
                return ret;

-       ret = test_bpf();
+       // ret = test_bpf();
+       test_skb_segment();

        destroy_bpf_tests();
        return ret;


=============  END ================


On 3/13/18 6:15 PM, Eric Dumazet wrote:


On 03/13/2018 05:35 PM, Eric Dumazet wrote:


On 03/13/2018 05:26 PM, Eric Dumazet wrote:


On 03/13/2018 05:04 PM, Alexei Starovoitov wrote:
On 3/13/18 4:27 PM, Eric Dumazet wrote:


On 03/13/2018 04:09 PM, Alexei Starovoitov wrote:

we have bpf_skb_proto_6_to_4() that was used by cilium for long time.
It's not clear why it's not crashing there, but we cannot just
reject changing proto in bpf programs now.
We have to fix whatever needs to be fixed in skb_segment
(if bug is there) or fix whatever necessary on mlx5 side.
In bpf helper we mark it as SKB_GSO_DODGY just like packets coming
through virtio would do, so if skb_segment() needs to do something
special with skb the SKB_GSO_DODGY flag is already there.

'Fixing' skb_segment(), I did that a long time ago and Herbert Xu was
not happy with the fix and provided something else.

any link to your old patches and discussion?

I think since mlx4 can do tso on them and the packets came out
correct on the wire, there is nothing fundamentally wrong with
changing gso_size. Just tricky to teach skb_segment.


The world is not mlx4 only. Some NIC will ask skb_segment() fallback segmentation for various reasons (like skb->len above a given limit like 16KB)

Maybe https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg255549.html&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=x9lG2k53B7AFsnNx6t2DgbXt06J-sLznZIHk6YdGoGg&s=0Nxx2G8PtDAEMCFAvQ7kxYTXVr9aHdOolP1KB_lnmes&e=


Herbert patch :

commit 9d8506cc2d7ea1f911c72c100193a3677f6668c3
Author: Herbert Xu <herb...@gondor.apana.org.au>
Date:   Thu Nov 21 11:10:04 2013 -0800

     gso: handle new frag_list of frags GRO packets


I found my initial patch.

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg255452.html&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=x9lG2k53B7AFsnNx6t2DgbXt06J-sLznZIHk6YdGoGg&s=VuWRpUdJwBwTxpnMNZYgKvQANLL5UA7hZnTFZsQlK6c&e=


Reply via email to