On Wed, Feb 15, 2017 at 5:26 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Wed, 2017-02-15 at 16:52 +0200, Matan Barak (External) wrote: > >> So, in case of RDMA CQs, we add some per-CQE overhead of comparing the >> list pointers and condition upon that. Maybe we could add an >> invoke_tasklet boolean field on mlx4_cq and return its value from >> mlx4_cq_completion. >> That's way we could do invoke_tasklet |= mlx4_cq_completion(....); >> >> Outside the while loop we could just >> if (invoke_tasklet) >> tasklet_schedule >> >> Anyway, I guess that even with per-CQE overhead, the performance impact >> here is pretty negligible - so I guess that's fine too :) > > > Real question or suggestion would be to use/fire a tasklet only under > stress. > > Firing a tasklet adds a lot of latencies for user-space CQ completion, > since softirqs might have to be handled by a kernel thread (ksoftirqd) >
At least for mlx4_en driver we don't need this tasklet and it is only adding this overhead. (we have napi) we must consider removing it for mlx4_en cqs and move the tasklet handling to mlx4_ib. I will ack the patch.