On Thu, 1 Oct 2020 09:52:45 +0200 Eric Dumazet wrote: > On Wed, Sep 30, 2020 at 10:08 PM Jakub Kicinski <k...@kernel.org> wrote: > > On Wed, 30 Sep 2020 12:21:35 -0700 Wei Wang wrote: > > > With napi poll moved to kthread, scheduler is in charge of scheduling both > > > the kthreads handling network load, and the user threads, and is able to > > > make better decisions. In the previous benchmark, if we do this and we > > > pin the kthreads processing napi poll to specific CPUs, scheduler is > > > able to schedule user threads away from these CPUs automatically. > > > > > > And the reason we prefer 1 kthread per napi, instead of 1 workqueue > > > entity per host, is that kthread is more configurable than workqueue, > > > and we could leverage existing tuning tools for threads, like taskset, > > > chrt, etc to tune scheduling class and cpu set, etc. Another reason is > > > if we eventually want to provide busy poll feature using kernel threads > > > for napi poll, kthread seems to be more suitable than workqueue. > > > > As I said in my reply to the RFC I see better performance with the > > workqueue implementation, so I would hold off until we have more > > conclusive results there, as this set adds fairly strong uAPI that > > we'll have to support for ever. > > We can make incremental changes, the kthread implementation looks much > nicer to us.
Having done two implementation of something more wq-like now I can say with some confidence that it's quite likely not a simple extension of this model. And since we'll likely need to support switching at runtime there will be a fast-path synchronization overhead. > The unique work queue is a problem on server class platforms, with > NUMA placement. > We now have servers with NIC on different NUMA nodes. Are you saying that the wq code is less NUMA friendly than unpinned threads? > We can not introduce a new model that will make all workload better > without any tuning. > If you really think you can do that, think again. Has Wei tested the wq implementation with real workloads? All the cover letter has is some basic netperf runs and a vague sentence saying "real workload also improved". I think it's possible to get something that will be a better default for 90% of workloads. Our current model predates SMP by two decades. It's pretty bad. I'm talking about upstream defaults, obviously, maybe you're starting from a different baseline configuration than the rest of the world.. > Even the old ' fix' (commit 4cd13c21b207e80ddb1144c576500098f2d5f882 > "softirq: Let ksoftirqd do its job" ) > had severe issues for latency sensitive jobs. > > We need to be able to opt-in to threads, and let process scheduler > take decisions. > If we believe the process scheduler takes bad decision, it should be > reported to scheduler experts. I wouldn't expect that the scheduler will learn all by itself how to group processes that run identical code for cache efficiency, and how to schedule at 10us scale. I hope I'm wrong. > I fully support this implementation, I do not want to wait for yet > another 'work queue' model or scheduler classes. I can't sympathize. I don't understand why you're trying to rush this. And you're not giving me enough info about your target config to be able to understand your thinking.