On Fri, 2 Oct 2020 09:56:31 +0200 Eric Dumazet wrote: > On Thu, Oct 1, 2020 at 10:26 PM Jakub Kicinski <k...@kernel.org> wrote: > > > > On Thu, 1 Oct 2020 09:52:45 +0200 Eric Dumazet wrote: > > > > The unique work queue is a problem on server class platforms, with > > > NUMA placement. > > > We now have servers with NIC on different NUMA nodes. > > > > Are you saying that the wq code is less NUMA friendly than unpinned > > threads? > > Yes this is what I am saying. > > Using a single and shared wq wont allow you to make sure : > - work for NIC0 attached on NUMA node#0 will be using CPUS belonging to node#0 > - work for NIC1 attached on NUMA node#1 will be using CPUS belonging to node#1 > > > The only way you can tune things with a single wq is tweaking a single > cpumask, > that we can change with /sys/devices/virtual/workqueue/{wqname}/cpumask > The same for the nice value with > /sys/devices/virtual/workqueue/{wqname}/nice. > > In contrast, having kthreads let you tune things independently, if needed. > > Even with a single NIC, you can still need isolation between queues. > We have queues dedicated to a certain kind of traffic/application. > > The work queue approach would need to be able to create/delete > independent workqueues. > But we tested the workqueue with a single NIC and our results gave to > kthreads a win over the work queue.
Not according to the results Wei posted last night.. > Really, wq concept might be a nice abstraction when each work can be > running for arbitrary durations, > and arbitrary numbers of cpus, but with the NAPI model of up to 64 > packets at a time, and a fixed number of queues, In my experiments the worker threads get stalled sooner or later. And unless there is some work stealing going on latency spikes follow. I would also not discount the variability in processing time. For a budget of 64 the processing can take 0-500us per round, not counting outliers. > we should not add the work queue overhead. Does this mean you're going to be against the (more fleshed out) work queue implementation?