> Date: Sun, 27 Jan 2019 12:56:34 -0200 > From: Martin Pieuchot <m...@openbsd.org> > > On 27/01/19(Sun) 01:02, Mark Kettenis wrote: > > > Date: Sat, 26 Jan 2019 14:46:41 -0200 > > > From: Martin Pieuchot <m...@openbsd.org> > > > > > > On MP machines, when a CPU executes mi_switch() and doesn't have any > > > thread on its runqueue it will try to steal one from another CPU's > > > runqueue. If it fails to steal a thread from another runqueue it > > > will pick its own Idle thread. > > > > > > To decide which thread should be picked, the scheduler evaluate the > > > "cost" of moving all runnable threads to the current CPU and pick the > > > best one. This is done by calling sched_proc_to_cpu_cost(). However > > > this function doesn't really makes sense in this context because the > > > destination CPU is always the same. So all variables to determine > > > the cost of moving a thread are constant except the priority of the > > > given thread. > > > > > > So I'd like to commit the diff below which makes clear what the > > > scheduler is currently doing: pick the runnable thread with higher > > > priority (lowest `p_priority' number). > > > > > > This change is also necessary to improve the placement of threads > > > being awaken. Because the real function of sched_proc_to_cpu_cost() > > > is to select a *CPU* where a thread is going to be run. Since I don't > > > want to change the algorithm used to steal threads, let's stop using > > > sched_proc_to_cpu_cost() there. > > > > > > Ok? > > > > Hmm. The idea has always been that sched_proc_to_cpu_cost() should > > also take into account the CPU the process last ran on. The idea > > being that moving processes between cores of the same physical > > processor may very well be cheaper that moving between different > > physical processors since a bigger part of the cache hierarchy is > > shared. See the comment above the function. > > > > Now coming up with a good cost model for this isn't trivial, > > especially since we don't really have the right topology information > > available in the kernel. But I'd still sat that calling > > sched_proc_to_cpu_cost() is the right function to call here. > > I disagree with your conclusion. Remember that calculating the cost is > not only a topological question. So far distances between CPUs hasn't > been considered on OpenBSD.
It has been considered; just never implemented. > Now, the comment on top of sched_proc_to_cpu_cost() says: > > "[...]. A highly loaded cpu might not be the best candidate for > this proc since it won't get run." > > This, like all the logic in this function, doesn't apply to stealing > threads. Because a CPU tries to steal a thread when its own runqueue > is empty, so the stolen thread will run. Right. But this will just add the same cost to each and every process that we're trying to steal, do it won't make a difference. > I agree that the existing cost model should be revisited, that's what > I'm trying to do! :) However placing a thread on a CPU after a wakeup > is a different problem than stealing a thread. It even depends on who > is doing the wakeup! Maybe, maybe not. The problem is quite similar. Is it a good idea to incur the cost of moving a process to a different CPU in order to make it run earlier. We need some sort of process affinity to avoid thrashing the CPU caches like crazy. > I'm trying to improve the placement of threads but don't want to change > the stealing logic for obvious reasons. What kind of improvements are you thinking of? > Placement of threads is trying to predict the future based on the past. > It is not only a topological question and since sched_proc_to_cpu_cost() > does not currently apply to stealing, I'd like to stop using it there to > be able to improve its algorithm for wakeup() & consider it for yield(). Unfortunately preicting the future based on the past is probably the best available strategy ;). Anyway, maybe you can consider adding a seperate cost function for stealing?