This patch set introduces new infrastructure for programmatically processing packets in the earliest stages of rx, as part of an effort others are calling Express Data Path (XDP) [1]. Start this effort by introducing a new bpf program type for early packet filtering, before even an skb has been allocated.
With this, hope to enable line rate filtering, with this initial implementation providing drop/allow action only. Patch 1 introduces the new prog type and helpers for validating the bpf program. A new userspace struct is defined containing only len as a field, with others to follow in the future. In patch 2, create a new ndo to pass the fd to support drivers. In patch 3, expose a new rtnl option to userspace. In patch 4, enable support in mlx4 driver. No skb allocation is required, instead a static percpu skb is kept in the driver and minimally initialized for each driver frag. In patch 5, create a sample drop and count program. With single core, achieved ~20 Mpps drop rate on a 40G mlx4. This includes packet data access, bpf array lookup, and increment. Interestingly, accessing packet data from the program did not have a noticeable impact on performance. Even so, future enhancements to prefetching / batching / page-allocs should hopefully improve the performance in this path. [1] https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf v2: 1/5: Drop xdp from types, instead consistently use bpf_phys_dev_. Introduce enum for return values from phys_dev hook. 2/5: Move prog->type check to just before invoking ndo. Change ndo to take a bpf_prog * instead of fd. Add ndo_bpf_get rather than keeping a bool in the netdev struct. 3/5: Use ndo_bpf_get to fetch bool. 4/5: Enforce that only 1 frag is ever given to bpf prog by disallowing mtu to increase beyond FRAG_SZ0 when bpf prog is running, or conversely to set a bpf prog when priv->num_frags > 1. Rename pseudo_skb to bpf_phys_dev_md. Implement ndo_bpf_get. Add dma sync just before invoking prog. Check for explicit bpf return code rather than nonzero. Remove increment of rx_dropped. 5/5: Use explicit bpf return code in example. Update commit log with higher pps numbers. Brenden Blanco (5): bpf: add PHYS_DEV prog type for early driver filter net: add ndo to set bpf prog in adapter rx rtnl: add option for setting link bpf prog mlx4: add support for fast rx drop bpf program Add sample for adding simple drop program to link drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 65 +++++++++++ drivers/net/ethernet/mellanox/mlx4/en_rx.c | 25 +++- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 6 + include/linux/netdevice.h | 13 +++ include/uapi/linux/bpf.h | 14 +++ include/uapi/linux/if_link.h | 1 + kernel/bpf/verifier.c | 1 + net/core/dev.c | 38 ++++++ net/core/filter.c | 68 +++++++++++ net/core/rtnetlink.c | 12 ++ samples/bpf/Makefile | 4 + samples/bpf/bpf_load.c | 8 ++ samples/bpf/netdrvx1_kern.c | 26 +++++ samples/bpf/netdrvx1_user.c | 155 +++++++++++++++++++++++++ 14 files changed, 432 insertions(+), 4 deletions(-) create mode 100644 samples/bpf/netdrvx1_kern.c create mode 100644 samples/bpf/netdrvx1_user.c -- 2.8.0