On Tue, May 15, 2018 at 02:13:50PM +0200, Jesper Dangaard Brouer wrote:
> This patch change the API for ndo_xdp_xmit to support bulking
> xdp_frames.
> 
> When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
> Most of the slowdown is caused by DMA API indirect function calls, but
> also the net_device->ndo_xdp_xmit() call.
> 
> Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
> single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
> performance improved:
>  for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
>  for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps
> 
> With frames avail as a bulk inside the driver ndo_xdp_xmit call,
> further optimizations are possible, like bulk DMA-mapping for TX.
> 
> Testing without CONFIG_RETPOLINE show the same performance for
> physical NIC drivers.
> 
> The virtual NIC driver tun sees a huge performance boost, as it can
> avoid doing per frame producer locking, but instead amortize the
> locking cost over the bulk.
> 
> V2: Fix compile errors reported by kbuild test robot <l...@intel.com>
> 
> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   26 +++++++---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    2 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 ++++++--
>  drivers/net/tun.c                             |   37 +++++++++-----
>  drivers/net/virtio_net.c                      |   66 
> +++++++++++++++++++------
>  include/linux/netdevice.h                     |   14 +++--
>  include/net/page_pool.h                       |    5 +-
>  include/net/xdp.h                             |    1 
>  include/trace/events/xdp.h                    |   10 ++--
>  kernel/bpf/devmap.c                           |   33 ++++++++-----
>  net/core/filter.c                             |    4 +-
>  net/core/xdp.c                                |   20 ++++++--
>  samples/bpf/xdp_monitor_kern.c                |   10 ++++
>  samples/bpf/xdp_monitor_user.c                |   35 +++++++++++--
>  14 files changed, 206 insertions(+), 78 deletions(-)

This patch has to be split into at least five:
- bpf and net core piece
- intel driver changes
- tun/virtio changes
- addition of tracepoints
- addition to samples
Putting changes from all over the areas into one patch makes it harder
to review, bisect, ack, test, merge conflicts.

Same issue with 3/4 as well. Please split it into two (core and samples).

Reply via email to