On 27/01/19(Sun) 01:02, Mark Kettenis wrote: > > Date: Sat, 26 Jan 2019 14:46:41 -0200 > > From: Martin Pieuchot <m...@openbsd.org> > > > > On MP machines, when a CPU executes mi_switch() and doesn't have any > > thread on its runqueue it will try to steal one from another CPU's > > runqueue. If it fails to steal a thread from another runqueue it > > will pick its own Idle thread. > > > > To decide which thread should be picked, the scheduler evaluate the > > "cost" of moving all runnable threads to the current CPU and pick the > > best one. This is done by calling sched_proc_to_cpu_cost(). However > > this function doesn't really makes sense in this context because the > > destination CPU is always the same. So all variables to determine > > the cost of moving a thread are constant except the priority of the > > given thread. > > > > So I'd like to commit the diff below which makes clear what the > > scheduler is currently doing: pick the runnable thread with higher > > priority (lowest `p_priority' number). > > > > This change is also necessary to improve the placement of threads > > being awaken. Because the real function of sched_proc_to_cpu_cost() > > is to select a *CPU* where a thread is going to be run. Since I don't > > want to change the algorithm used to steal threads, let's stop using > > sched_proc_to_cpu_cost() there. > > > > Ok? > > Hmm. The idea has always been that sched_proc_to_cpu_cost() should > also take into account the CPU the process last ran on. The idea > being that moving processes between cores of the same physical > processor may very well be cheaper that moving between different > physical processors since a bigger part of the cache hierarchy is > shared. See the comment above the function. > > Now coming up with a good cost model for this isn't trivial, > especially since we don't really have the right topology information > available in the kernel. But I'd still sat that calling > sched_proc_to_cpu_cost() is the right function to call here.
I disagree with your conclusion. Remember that calculating the cost is not only a topological question. So far distances between CPUs hasn't been considered on OpenBSD. Now, the comment on top of sched_proc_to_cpu_cost() says: "[...]. A highly loaded cpu might not be the best candidate for this proc since it won't get run." This, like all the logic in this function, doesn't apply to stealing threads. Because a CPU tries to steal a thread when its own runqueue is empty, so the stolen thread will run. I agree that the existing cost model should be revisited, that's what I'm trying to do! :) However placing a thread on a CPU after a wakeup is a different problem than stealing a thread. It even depends on who is doing the wakeup! I'm trying to improve the placement of threads but don't want to change the stealing logic for obvious reasons. Placement of threads is trying to predict the future based on the past. It is not only a topological question and since sched_proc_to_cpu_cost() does not currently apply to stealing, I'd like to stop using it there to be able to improve its algorithm for wakeup() & consider it for yield(). Does that make sense?