On MP machines, when a CPU executes mi_switch() and doesn't have any thread on its runqueue it will try to steal one from another CPU's runqueue. If it fails to steal a thread from another runqueue it will pick its own Idle thread.
To decide which thread should be picked, the scheduler evaluate the "cost" of moving all runnable threads to the current CPU and pick the best one. This is done by calling sched_proc_to_cpu_cost(). However this function doesn't really makes sense in this context because the destination CPU is always the same. So all variables to determine the cost of moving a thread are constant except the priority of the given thread. So I'd like to commit the diff below which makes clear what the scheduler is currently doing: pick the runnable thread with higher priority (lowest `p_priority' number). This change is also necessary to improve the placement of threads being awaken. Because the real function of sched_proc_to_cpu_cost() is to select a *CPU* where a thread is going to be run. Since I don't want to change the algorithm used to steal threads, let's stop using sched_proc_to_cpu_cost() there. Ok? Index: kern/kern_sched.c =================================================================== RCS file: /cvs/src/sys/kern/kern_sched.c,v retrieving revision 1.54 diff -u -p -r1.54 kern_sched.c --- kern/kern_sched.c 17 Nov 2018 23:10:08 -0000 1.54 +++ kern/kern_sched.c 26 Jan 2019 16:30:27 -0000 @@ -494,8 +494,7 @@ sched_steal_proc(struct cpu_info *self) if (p->p_flag & P_CPUPEG) continue; - cost = sched_proc_to_cpu_cost(self, p); - + cost = p->p_priority; if (best == NULL || cost < bestcost) { best = p; bestcost = cost;