On 10/2/20 5:25 PM, John Fastabend wrote:
Lorenzo Bianconi wrote:This series introduce XDP multi-buffer support. The mvneta driver is the first to support these new "non-linear" xdp_{buff,frame}. Reviewers please focus on how these new types of xdp_{buff,frame} packets traverse the different layers and the layout design. It is on purpose that BPF-helpers are kept simple, as we don't want to expose the internal layout to allow later changes.For now, to keep the design simple and to maintain performance, the XDP BPF-prog (still) only have access to the first-buffer. It is left for later (another patchset) to add payload access across multiple buffers. This patchset should still allow for these future extensions. The goal is to lift the XDP MTU restriction that comes with XDP, but maintain same performance as before. The main idea for the new multi-buffer layout is to reuse the same layout used for non-linear SKB. This rely on the "skb_shared_info" struct at the end of the first buffer to link together subsequent buffers. Keeping the layout compatible with SKBs is also done to ease and speedup creating an SKB from an xdp_{buff,frame}. Converting xdp_frame to SKB and deliver it to the network stack is shown in cpumap code (patch 13/13).Using the end of the buffer for the skb_shared_info struct is going to become driver API so unwinding it if it proves to be a performance issue is going to be ugly. So same question as before, for the use case where we receive packet and do XDP_TX with it how do we avoid cache miss overhead? This is not just a hypothetical use case, the Facebook load balancer is doing this as well as Cilium and allowing this with multi-buffer packets >1500B would be useful.
[...] Fully agree. My other question would be if someone else right now is in the process of implementing this scheme for a 40G+ NIC? My concern is the numbers below are rather on the lower end of the spectrum, so I would like to see a comparison of XDP as-is today vs XDP multi-buff on a higher end NIC so that we have a picture how well the current designed scheme works there and into which performance issue we'll run e.g. under typical XDP L4 load balancer scenario with XDP_TX. I think this would be crucial before the driver API becomes 'sort of' set in stone where others start to adapting it and changing design becomes painful. Do ena folks have an implementation ready as well? And what about virtio_net, for example, anyone committing there too? Typically for such features to land is to require at least 2 drivers implementing it.
Typical use cases for this series are: - Jumbo-frames - Packet header split (please see Google���s use-case @ NetDevConf 0x14, [0]) - TSO More info about the main idea behind this approach can be found here [1][2]. We carried out some throughput tests in a standard linear frame scenario in order to verify we did not introduced any performance regression adding xdp multi-buff support to mvneta: offered load is ~ 1000Kpps, packet size is 64B, mvneta descriptor size is one PAGE commit: 879456bedbe5 ("net: mvneta: avoid possible cache misses in mvneta_rx_swbm") - xdp-pass: ~162Kpps - xdp-drop: ~701Kpps - xdp-tx: ~185Kpps - xdp-redirect: ~202Kpps mvneta xdp multi-buff: - xdp-pass: ~163Kpps - xdp-drop: ~739Kpps - xdp-tx: ~182Kpps - xdp-redirect: ~202Kpps
[...]
