On 21 May 2018 at 20:55, Björn Töpel <bjorn.to...@gmail.com> wrote: > > 2018-05-21 14:34 GMT+02:00 Mykyta Iziumtsev <mykyta.iziumt...@linaro.org>: > > Hi Björn and Magnus, > > > > (This thread is a follow up to private dialogue. The intention is to > > let community know that AF_XDP can be enhanced further to make it > > compatible with wider range of NIC vendors). > > > > Mykyta, thanks for doing the write-up and sending it to the netdev > list! The timing could not be better -- we need to settle on an uapi > that works for all vendors prior enabling it in the kernel.
[Resending with vger-compatible formatting.] So! The discussion here seems to be about how to make the XDP uapi accommodate all hardware vendors but I wanted to chime in with a userspace application developer perspective (remember us? ;-)) These days more and more people understand the weird and wonderful ways that NICs want to deal with packet memory. Scatter-gather lists; typewriter buffers; payload inline in descriptors; metadata inline in payload; constraints on buffer size; constraints on buffer alignment; etc; etc; etc. How about userspace applications though? We also have our own ideas about the ways that things should be done. I think there is a fundamental tension here: the more flexibility you provide to hardware, the more constraints you impose on applications, and vice versa. To be concrete let me explain the peculiar way that we handle packet memory in the Snabb application. Snabb uses a simple representation of packets in memory: struct packet { uint16_t length; unsigned char data[10 * 1024]; } and a special allocator so that the virtual address of each packet: - Is identical in every process that can share traffic; - Can be mapped on demand (via SIGSEGV fault handler); - Can be used to calculate the DMA (physical) address; - Can be used to calculate how much headroom is available. So our scheme is fairly nuanced. Just now this seems to fit well with most NICs, which allow scatter-gather operation from memory allocated independently by the application, but we have to resolve an impedence mismatch (copy) for e.g. typewriter model. Overall this situation is quite acceptable. How would this fit with the XDP uapi though? Can we preserve these properties of our packets and make them XDP-compatible? The ideal for us would probably be to replace the code that allocates a HugeTLB for packet data with an equivalent that allocates a chunk of XDP-compatible memory that we can slice up and mremap to suit our taste. If that is not possible then I see a couple of alternatives: One would be to drop all of our invariants on packet addresses and switch to a more middle-of-the-road design that puts everything inline into the packet (an "sk_buff-alike.") Then we would outsource all the allocation to the kernel, which would do it specially to suit the hardware from $VENDOR. (And hopefully deal somehow with mixing traffic from $OTHERVENDOR too, etc.) The other alternative would be to preserve our current packet structure and introduce a copy into separate XDP memory on transmit/receive. This is the approach that we take today with vhost-user and is the approach we would take if we supported a "typewriter" style NIC too. I'm not immediately wild about either of those options though, and I am not sure how keen the next wave of application developers turning up over the next 5-10 years and "doing it our way" will be either. So, anyway, that is my braindump on trying to understand how suitable XDP would be for us as application developers, and how much of this depends on the fine details that are being discussed on this thread. I hope this perspective is a useful complement to the feedback from hardware makers. Cheers, -Luke