On Tue, May 15, 2018 at 02:13:50PM +0200, Jesper Dangaard Brouer wrote: > This patch change the API for ndo_xdp_xmit to support bulking > xdp_frames. > > When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown. > Most of the slowdown is caused by DMA API indirect function calls, but > also the net_device->ndo_xdp_xmit() call. > > Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with > single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed > performance improved: > for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps > for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps > > With frames avail as a bulk inside the driver ndo_xdp_xmit call, > further optimizations are possible, like bulk DMA-mapping for TX. > > Testing without CONFIG_RETPOLINE show the same performance for > physical NIC drivers. > > The virtual NIC driver tun sees a huge performance boost, as it can > avoid doing per frame producer locking, but instead amortize the > locking cost over the bulk. > > V2: Fix compile errors reported by kbuild test robot <l...@intel.com> > > Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> > --- > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 +++++++--- > drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 - > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 21 ++++++-- > drivers/net/tun.c | 37 +++++++++----- > drivers/net/virtio_net.c | 66 > +++++++++++++++++++------ > include/linux/netdevice.h | 14 +++-- > include/net/page_pool.h | 5 +- > include/net/xdp.h | 1 > include/trace/events/xdp.h | 10 ++-- > kernel/bpf/devmap.c | 33 ++++++++----- > net/core/filter.c | 4 +- > net/core/xdp.c | 20 ++++++-- > samples/bpf/xdp_monitor_kern.c | 10 ++++ > samples/bpf/xdp_monitor_user.c | 35 +++++++++++-- > 14 files changed, 206 insertions(+), 78 deletions(-)
This patch has to be split into at least five: - bpf and net core piece - intel driver changes - tun/virtio changes - addition of tracepoints - addition to samples Putting changes from all over the areas into one patch makes it harder to review, bisect, ack, test, merge conflicts. Same issue with 3/4 as well. Please split it into two (core and samples).