Xuan Zhuo wrote:
> This patch is used to construct skb based on page to save memory copy
> overhead.
> 
> This has one problem:
> 
> We construct the skb by fill the data page as a frag into the skb. In
> this way, the linear space is empty, and the header information is also
> in the frag, not in the linear space, which is not allowed for some
> network cards. For example, Mellanox Technologies MT27710 Family
> [ConnectX-4 Lx] will get the following error message:
> 
>     mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, 
> opcode 0xd, syndrome 0x1, vendor syndrome 0x68
>     00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
>     WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
>     00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
>     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
>     00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
> 
> I also tried to use build_skb to construct skb, but because of the
> existence of skb_shinfo, it must be behind the linear space, so this
> method is not working. We can't put skb_shinfo on desc->addr, it will be
> exposed to users, this is not safe.
> 
> Finally, I added a feature NETIF_F_SKB_NO_LINEAR to identify whether the
> network card supports the header information of the packet in the frag
> and not in the linear space.
> 
> ---------------- Performance Testing ------------
> 
> The test environment is Aliyun ECS server.
> Test cmd:
> ```
> xdpsock -i eth0 -t  -S -s <msg size>
> ```
> 
> Test result data:
> 
> size    64      512     1024    1500
> copy    1916747 1775988 1600203 1440054
> page    1974058 1953655 1945463 1904478
> percent 3.0%    10.0%   21.58%  32.3%

Looks like a good perf bump. Some easy suggestions below

> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> +                                  struct xdp_desc *desc, int *err)
> +{

Passing a 'int *err' here is ugly IMO use the ERR_PTR/PTR_ERR macros
and roll it into the return value.

or maybe use the out: pattern used in the kernel, but just doing direct
returns like now but with ERR_PTR() would also be fine.

> +     struct sk_buff *skb ;
        struct sk_buff *skb = NULL;
        err = -ENOMEM;
> +
> +     if (xs->dev->features & NETIF_F_SKB_NO_LINEAR) {
> +             skb = xsk_build_skb_zerocopy(xs, desc);
> +             if (unlikely(!skb)) {
                        goto out

> +                     *err = -ENOMEM;
> +                     return NULL;
> +             }
> +     } else {
> +             char *buffer;
> +             u64 addr;
> +             u32 len;
> +             int err;
> +
> +             len = desc->len;
> +             skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> +             if (unlikely(!skb)) {
                        goto out;
> +                     *err = -ENOMEM;
> +                     return NULL;
> +             }
> +
> +             skb_put(skb, len);
> +             addr = desc->addr;
> +             buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> +             err = skb_store_bits(skb, 0, buffer, len);
> +
> +             if (unlikely(err)) {
> +                     kfree_skb(skb);

                        err = -EINVAL;
                        goto out

> +                     *err = -EINVAL;
> +                     return NULL;
> +             }
> +     }
> +
> +     skb->dev = xs->dev;
> +     skb->priority = xs->sk.sk_priority;
> +     skb->mark = xs->sk.sk_mark;
> +     skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> +     skb->destructor = xsk_destruct_skb;
> +
> +     return skb;

out:
        kfree_skb(skb)
        return ERR_PTR(err);

> +}
> +

Otherwise looks good thanks.

Reply via email to