On Thu, 1 Oct 2020 09:52:45 +0200 Eric Dumazet wrote:
> On Wed, Sep 30, 2020 at 10:08 PM Jakub Kicinski <k...@kernel.org> wrote:
> > On Wed, 30 Sep 2020 12:21:35 -0700 Wei Wang wrote:  
> > > With napi poll moved to kthread, scheduler is in charge of scheduling both
> > > the kthreads handling network load, and the user threads, and is able to
> > > make better decisions. In the previous benchmark, if we do this and we
> > > pin the kthreads processing napi poll to specific CPUs, scheduler is
> > > able to schedule user threads away from these CPUs automatically.
> > >
> > > And the reason we prefer 1 kthread per napi, instead of 1 workqueue
> > > entity per host, is that kthread is more configurable than workqueue,
> > > and we could leverage existing tuning tools for threads, like taskset,
> > > chrt, etc to tune scheduling class and cpu set, etc. Another reason is
> > > if we eventually want to provide busy poll feature using kernel threads
> > > for napi poll, kthread seems to be more suitable than workqueue.  
> >
> > As I said in my reply to the RFC I see better performance with the
> > workqueue implementation, so I would hold off until we have more
> > conclusive results there, as this set adds fairly strong uAPI that
> > we'll have to support for ever.  
> 
> We can make incremental changes, the kthread implementation looks much
> nicer to us.

Having done two implementation of something more wq-like now 
I can say with some confidence that it's quite likely not a 
simple extension of this model. And since we'll likely need
to support switching at runtime there will be a fast-path
synchronization overhead.

> The unique work queue is a problem on server class platforms, with
> NUMA placement.
> We now have servers with NIC on different NUMA nodes.

Are you saying that the wq code is less NUMA friendly than unpinned
threads?

> We can not introduce a new model that will make all workload better
> without any tuning.
> If you really think you can do that, think again.

Has Wei tested the wq implementation with real workloads?

All the cover letter has is some basic netperf runs and a vague
sentence saying "real workload also improved".

I think it's possible to get something that will be a better default
for 90% of workloads. Our current model predates SMP by two decades.
It's pretty bad.

I'm talking about upstream defaults, obviously, maybe you're starting
from a different baseline configuration than the rest of the world..

> Even the old ' fix'  (commit 4cd13c21b207e80ddb1144c576500098f2d5f882
> "softirq: Let ksoftirqd do its job" )
> had severe issues for latency sensitive jobs.
> 
> We need to be able to opt-in to threads, and let process scheduler
> take decisions.
> If we believe the process scheduler takes bad decision, it should be
> reported to scheduler experts.

I wouldn't expect that the scheduler will learn all by itself how to
group processes that run identical code for cache efficiency, and how
to schedule at 10us scale. I hope I'm wrong.

> I fully support this implementation, I do not want to wait for yet
> another 'work queue' model or scheduler classes.

I can't sympathize. I don't understand why you're trying to rush this.
And you're not giving me enough info about your target config to be able
to understand your thinking.

Reply via email to