On Wed, Jun 27, 2018 at 5:57 PM Eric Dumazet <eric.duma...@gmail.com> wrote: > > > > On 06/27/2018 07:02 AM, Magnus Karlsson wrote: > > There was a potential race in the TX completion code for > > the SKB case when the TX napi thread and the error path > > of the sendmsg code could both call the SKB destructor > > at the same time. Fixed by introducing a spin_lock in the > > destructor. > > > Wow, what is the impact on performance ? > > Please describe a bit more what is the problem.
One process enters the sendmsg code of an AF_XDP socket in order to send a frame. The execution eventually trickles down to the driver that is told to send the packet. However, it decides to drop the packet due to some error condition (e.g., rings full) and frees the SKB. This will trigger the SKB destructor and a completion will be sent to the AF_XDP user space through its single-producer/single-consumer queues. At the same time a TX interrupt has fired on another core and it dispatches the TX completion code in the driver. It does its HW specific things and ends up freeing the SKB associated with the transmitted packet. This will trigger the SKB destructor and a completion will be sent to the AF_XDP user space through its single-producer/single-consumer queues. With a pseudo call stack, it would look like this: Core 1: sendmsg() being called in the application netdev_start_xmit() Driver entered through ndo_start_xmit Driver decides to free the SKB for some reason (e.g., rings full) Destructor of SKB called xskq_produce_addr() is called to signal completion to user space Core 2: TX completion irq NAPI loop Driver irq handler for TX completions Frees the SKB Destructor of SKB called xskq_produce_addr() is called to signal completion to user space We now have a violation of the single-producer/single-consumer principle for our queues as there are two threads trying to produce at the same time on the same queue. Let me know if this is clear enough and I will put it in the commit message and submit a V2. In regards to the performance, I get around 1.74 Mpps for txonly before and after the introduction of the spinlock on my machine. There is of course some impact due to the spin lock but it is in the less significant digits that are too noisy for me to measure. But let us say that the version without the spin lock got 1.745 Mpps in the best case and the version with 1.735 Mpps in the worst case, then that would mean a maximum drop in performance of 0.5%. Will had some ideas in https://www.spinics.net/lists/netdev/msg497859.html on how to improve the performance of the TX SKB path for AF_XDP. We could always implement these in order to try to recoup the performance loss due to the spin lock. /Magnus