On Mon, 7 May 2018 11:13:58 +0200 Magnus Karlsson <magnus.karls...@gmail.com> wrote:
> On Sat, May 5, 2018 at 2:34 AM, Alexei Starovoitov > <alexei.starovoi...@gmail.com> wrote: > > On Fri, May 04, 2018 at 01:22:17PM +0200, Magnus Karlsson wrote: > >> On Fri, May 4, 2018 at 1:38 AM, Alexei Starovoitov > >> <alexei.starovoi...@gmail.com> wrote: > >> > On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote: > >> >> On 05/02/2018 01:01 PM, Björn Töpel wrote: > >> >> > From: Björn Töpel <bjorn.to...@intel.com> > >> >> > > >> >> > This patch set introduces a new address family called AF_XDP that is > >> >> > optimized for high performance packet processing and, in upcoming > >> >> > patch sets, zero-copy semantics. In this patch set, we have removed > >> >> > all zero-copy related code in order to make it smaller, simpler and > >> >> > hopefully more review friendly. This patch set only supports copy-mode > >> >> > for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode > >> >> > for RX using the XDP_DRV path. Zero-copy support requires XDP and > >> >> > driver changes that Jesper Dangaard Brouer is working on. Some of his > >> >> > work has already been accepted. We will publish our zero-copy support > >> >> > for RX and TX on top of his patch sets at a later point in time. > >> >> > >> >> +1, would be great to see it land this cycle. Saw few minor nits here > >> >> and there but nothing to hold it up, for the series: > >> >> > >> >> Acked-by: Daniel Borkmann <dan...@iogearbox.net> > >> >> > >> >> Thanks everyone! > >> > > >> > Great stuff! > >> > > >> > Applied to bpf-next, with one condition. > >> > Upcoming zero-copy patches for both RX and TX need to be posted > >> > and reviewed within this release window. > >> > If netdev community as a whole won't be able to agree on the zero-copy > >> > bits we'd need to revert this feature before the next merge window. > >> > >> Thanks everyone for reviewing this. Highly appreciated. > >> > >> Just so we understand the purpose correctly: > >> > >> 1: Do you want to see the ZC patches in order to verify that the user > >> space API holds? If so, we can produce an additional RFC patch set > >> using a big chunk of code that we had in RFC V1. We are not proud of > >> this code since it is clunky, but it hopefully proves the point with > >> the uapi being the same. > >> > >> 2: And/Or are you worried about us all (the netdev community) not > >> agreeing on a way to implement ZC internally in the drivers and the > >> XDP infrastructure? This is not going to be possible to finish during > >> this cycle since we do not like the implementation we had in RFC V1. > >> Too intrusive and now we also have nicer abstractions from Jesper that > >> we can use and extend to provide a (hopefully) much cleaner and less > >> intrusive solution. > > > > short answer: both. > > > > Cleanliness and performance of the ZC code is not as important as > > getting API right. The main concern that during ZC review process > > we will find out that existing API has issues, so we have to > > do this exercise before the merge window. > > And RFC won't fly. Send the patches for real. They have to go > > through the proper code review. The hackers of netdev community > > can accept a partial, or a bit unclean, or slightly inefficient > > implementation, since it can be and will be improved later, > > but API we cannot change once it goes into official release. > > > > Here is the example of API concern: > > this patch set added shared umem concept. It sounds good in theory, > > but will it perform well with ZC ? Earlier RFCs didn't have that > > feature. If it won't perform well than it shouldn't be in the tree. > > The key reason to let AF_XDP into the tree is its performance promise. > > If it doesn't perform we should rip it out and redesign. > > That is a fair point. We will try to produce patch sets for zero-copy > RX and TX using the latest interfaces within this merge window. Just > note that we will focus on this for the next week(s) instead of the > review items that you and Daniel Borkmann submitted. If we get those > patch sets out in time and we agree that they are a possible way > forward, then we produce patches with your fixes. It was mainly small > items, so should be quick. I would like to see that you create a new xdp_mem_type for this new zero-copy type. This will allow other XDP redirect methods/types (e.g. devmap and cpumap) to react appropriately when receiving a zero-copy frame. For devmap, I'm hoping we can allow/support using the ndo_xdp_xmit call without (first) copying (into a newly allocated page). By arguing that if an xsk-userspace app modify a frame it's not allowed to, then it is simply a bug in the program. (Note, this would also allow using ndo_xdp_xmit call for TX from xsk-userspace). For cpumap, it is hard to avoid a copy, but I'm hoping we could delay the copy (and alloc of mem dest area) until on the remote CPU. This is already the principle of cpumap; of moving the allocation of the SKB to the remote CPU. For ZC to interact with XDP redirect-core and return API, the zero-copy memory type/allocator, need to provide an area for the xdp_frame data to be stored in (as we cannot allow using top-of-frame like non-zero-copy variants), and extend xdp_frame with an ZC umem-id. I imagine we can avoid any dynamic allocations, as we upfront (at bind and XDP_UMEM_REG time) know the number of frames. (e.g. pre-alloc in xdp_umem_reg() call, and have xdp_umem_get_xdp_frame lookup func). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer