On 27/01/19(Sun) 01:02, Mark Kettenis wrote:
> > Date: Sat, 26 Jan 2019 14:46:41 -0200
> > From: Martin Pieuchot <m...@openbsd.org>
> > 
> > On MP machines, when a CPU executes mi_switch() and doesn't have any
> > thread on its runqueue it will try to steal one from another CPU's
> > runqueue.  If it fails to steal a thread from another runqueue it
> > will pick its own Idle thread.
> > 
> > To decide which thread should be picked, the scheduler evaluate the
> > "cost" of moving all runnable threads to the current CPU and pick the
> > best one.  This is done by calling sched_proc_to_cpu_cost().  However
> > this function doesn't really makes sense in this context because the
> > destination CPU is always the same.  So all variables to determine
> > the cost of moving a thread are constant except the priority of the
> > given thread.
> > 
> > So I'd like to commit the diff below which makes clear what the
> > scheduler is currently doing: pick the runnable thread with higher
> > priority (lowest `p_priority' number).
> > 
> > This change is also necessary to improve the placement of threads
> > being awaken.  Because the real function of sched_proc_to_cpu_cost()
> > is to select a *CPU* where a thread is going to be run.  Since I don't
> > want to change the algorithm used to steal threads, let's stop using
> > sched_proc_to_cpu_cost() there.
> > 
> > Ok?
> 
> Hmm.  The idea has always been that sched_proc_to_cpu_cost() should
> also take into account the CPU the process last ran on.  The idea
> being that moving processes between cores of the same physical
> processor may very well be cheaper that moving between different
> physical processors since a bigger part of the cache hierarchy is
> shared.  See the comment above the function.
> 
> Now coming up with a good cost model for this isn't trivial,
> especially since we don't really have the right topology information
> available in the kernel.  But I'd still sat that calling
> sched_proc_to_cpu_cost() is the right function to call here.

I disagree with your conclusion.  Remember that calculating the cost is
not only a topological question.  So far distances between CPUs hasn't
been considered on OpenBSD.

Now, the comment on top of sched_proc_to_cpu_cost() says:

    "[...].  A highly loaded cpu might not be the best candidate for
     this proc since it won't get run."

This, like all the logic in this function, doesn't apply to stealing
threads.  Because a CPU tries to steal a thread when its own runqueue
is empty, so the stolen thread will run.

I agree that the existing cost model should be revisited, that's what
I'm trying to do! :)  However placing a thread on a CPU after a wakeup
is a different problem than stealing a thread.  It even depends on who
is doing the wakeup!

I'm trying to improve the placement of threads but don't want to change
the stealing logic for obvious reasons.

Placement of threads is trying to predict the future based on the past.
It is not only a topological question and since sched_proc_to_cpu_cost()
does not currently apply to stealing, I'd like to stop using it there to
be able to improve its algorithm for wakeup() & consider it for yield().

Does that make sense?

Reply via email to