On 2020-07-26 19:58, Eric Dumazet wrote: > > > On 7/26/20 10:19 AM, Felix Fietkau wrote: >> On 2020-07-26 18:49, Eric Dumazet wrote: >>> On 7/26/20 9:31 AM, Felix Fietkau wrote: >>>> For some drivers (especially 802.11 drivers), doing a lot of work in the >>>> NAPI >>>> poll function does not perform well. Since NAPI poll is bound to the CPU it >>>> was scheduled from, we can easily end up with a few very busy CPUs spending >>>> most of their time in softirq/ksoftirqd and some idle ones. >>>> >>>> Introduce threaded NAPI for such drivers based on a workqueue. The API is >>>> the >>>> same except for using netif_threaded_napi_add instead of netif_napi_add. >>>> >>>> In my tests with mt76 on MT7621 using threaded NAPI + a thread for tx >>>> scheduling >>>> improves LAN->WLAN bridging throughput by 10-50%. Throughput without >>>> threaded >>>> NAPI is wildly inconsistent, depending on the CPU that runs the tx >>>> scheduling >>>> thread. >>>> >>>> With threaded NAPI, throughput seems stable and consistent (and higher than >>>> the best results I got without it). >>> >>> Note that even with a threaded NAPI, you will not be able to use more than >>> one cpu >>> to process the traffic. >> For a single threaded NAPI user that's correct. The main difference here >> is that the CPU running the poll function does not have to be the same >> as the CPU that scheduled it, and it can change based on the load. >> That makes a huge difference in my tests. > > This really looks like there is a problem in the driver itself. > > Have you first checked that this patch was not hurting your use case ? > > commit 4cd13c21b207e80ddb1144c576500098f2d5f882 softirq: Let ksoftirqd do > its job > > If yes, your proposal would again possibly hurt user space threads competing > with a high priority workqueue, and packets would not be consumed by user > applications. > Having cpus burning 100% of cycles in kernel space is useless. I already checked that a while back, and this patch is not hurting my use case. I think it is actually helping, since I'm putting on enough load to keep most softirq activity in ksoftirqd.
One thing to consider about my use case is that I'm bridging traffic between an Ethernet interface and an 802.11 interface. Those packets do not go through user space. If I push enough traffic, the ksoftirqd instance running NAPI of the 802.11 device keeps the CPU 100% busy. I do not have any signficant user space activity during my tests. Since tx and rx NAPI are scheduled from the same IRQ which only fires on one CPU, they end up in the same ksoftirqd instance. Considering that one CPU is not enough to handle entire NAPI workload for the device, threaded NAPI helps by letting other (otherwise mostly idle) CPUs pick up some of the workload. > It seems you need two cpus per queue, I guess this might be because > you use a single NAPI for both tx and rx ? > > Have you simply tried to use two NAPI, as some Ethernet drivers do ? I'm already doing that. > Do not get me wrong, but scheduling a thread only to process one packet at a > time > will hurt common cases. > > Really I do not mind if you add a threaded NAPI, but it seems you missed > a lot of NAPI requirements in the proposed patch. > > For instance, many ->poll() handlers assume BH are disabled. > > Also part of RPS logic depends on net_rx_action() calling > net_rps_action_and_irq_enable() I will look into that, thanks. - Felix