On Thu, 1 Oct 2020 18:44:40 -0700 Wei Wang wrote: > > Can you share relative performance delta of this banchmark? > > > > Could you explain why threads are slower than ksoftirqd if you pin the > > application away? From your cover letter it sounded like you want the > > scheduler to see the NAPI load, but then you say you pinned the > > application away from the NAPI cores for the test, so I'm confused. > > No. We did not explicitly pin the application threads away. > Application threads are free to run anywhere. What we do is we > restrict the NAPI kthreads to only those CPUs handling rx interrupts.
Whatever. You pin the NAPI threads and hand-tune their number so the load of the NAPI CPUs is always higher. If the workload changes the system will get very unhappy. > (For us, 8 cpus out of 56.) So the load on those CPUs are very high > when running the test. And the scheduler is smart enough to avoid > using those CPUs for the application threads automatically. > Here is the results of 1 representative test result: > cpu/op 50%tile 95%tile 99%tile > base 71.47 417us 1.01ms 2.9ms > kthread 67.84 396us 976us 2.4ms > workqueue 69.68 386us 791us 1.9ms Did you renice ksoftirqd in "base"? > Actually, I remembered it wrong. It does seem workqueue is doing > better on latencies. But cpu/op wise, kthread seems to be a bit > better. Q.E.D.