On 02.05.20 18:24, Eric Dumazet wrote: > > > On 5/2/20 9:10 AM, Julian Wiedmann wrote: >> On 02.05.20 17:40, Eric Dumazet wrote: >>> On Sat, May 2, 2020 at 7:56 AM Julian Wiedmann <j...@linux.ibm.com> wrote: >>>> >>>> On 22.04.20 18:13, Eric Dumazet wrote:
[...] >>>> >>>> >>>>> By default, both gro_flush_timeout and napi_defer_hard_irqs are zero. >>>>> >>>>> This patch does not change the prior behavior of gro_flush_timeout >>>>> if used alone : NIC hard irqs should be rearmed as before. >>>>> >>>>> One concrete usage can be : >>>>> >>>>> echo 20000 >/sys/class/net/eth1/gro_flush_timeout >>>>> echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs >>>>> >>>>> If at least one packet is retired, then we will reset napi counter >>>>> to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans >>>>> of the queue. >>>>> >>>>> On busy queues, this should avoid NIC hard IRQ, while before this patch >>>>> IRQ >>>>> avoidance was only possible if napi->poll() was exhausting its budget >>>>> and not call napi_complete_done(). >>>>> >>>> >>>> I was confused here for a second, so let me just clarify how this is >>>> intended >>>> to look like for pure TX completion IRQs: >>>> >>>> napi->poll() calls napi_complete_done() with an accurate work_done value, >>>> but >>>> then still returns 0 because TX completion work doesn't consume NAPI >>>> budget. >>> >>> >>> If the napi budget was consumed, the driver does _not_ call >>> napi_complete() or napi_complete_done() anyway. >>> >> >> I was thinking of "TX completions are cheap and don't consume _any_ NAPI >> budget, ever" >> as the current consensus, but looking at the mlx4 code that evidently isn't >> true >> for all drivers. > > TX completions are not cheap in many cases. > > Doing the unmap stuff can be costly in IOMMU world, and freeing skb > can be also expensive. > Add to this that TCP stack might be called back (via skb->destructor()) to > add more packets to the qdisc/device. > > So using effectively the budget as a limit might help in some stress > situations, > by not re-enabling NIC interrupts, even before napi_defer_hard_irqs addition. > Neat, thanks for sharing this. Now I also see the tricks that mlx4 plays to still get netpoll working.... fun. >> >>> If the budget is consumed, then napi_complete_done(napi, X>0) allows >>> napi_complete_done() >>> to return 0 if napi_defer_hard_irqs is not 0 >>> >>> This means that the NIC hard irq will stay disabled for at least one more >>> round. >>>