On 7/4/19 6:04 PM, Parth Shah wrote:
Same experiment with hackbench and with perf analysis shows increase in L1 cache miss rate with these patches (Lower is better) Baseline(%) Patch(%) ----------------------- ------------- ----------- Total Cache miss rate 17.01 19(-11%) L1 icache miss rate 5.45 6.7(-22%) So is is possible for idle_cpu search to try checking target_cpu first and then goto sliding window if not found. Below diff works as expected in IBM POWER9 system and resolves the problem of far wakeup upto large extent. diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ff2e9b5c3ac5..fae035ce1162 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6161,6 +6161,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t u64 time, cost; s64 delta; int cpu, limit, floor, target_tmp, nr = INT_MAX; + struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));if (!this_sd) @@ -6198,16 +6199,22 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int ttime = local_clock(); - for_each_cpu_wrap(cpu, sched_domain_span(sd), target_tmp) {+ cpumask_and(cpus, sched_domain_span(sd), &p->cpus_allowed); + for_each_cpu_wrap(cpu, cpu_smt_mask(target), target) { + __cpumask_clear_cpu(cpu, cpus); + if (available_idle_cpu(cpu)) + goto idle_cpu_exit; + } + + for_each_cpu_wrap(cpu, cpus, target_tmp) { per_cpu(next_cpu, target) = cpu; if (!--nr) return -1; - if (!cpumask_test_cpu(cpu, &p->cpus_allowed)) - continue; if (available_idle_cpu(cpu)) break; }+idle_cpu_exit:time = local_clock() - time; cost = this_sd->avg_scan_cost; delta = (s64)(time - cost) / 8; Best, Parth
How about calling select_idle_smt before select_idle_cpu from select_idle_sibling? That should have the same effect.

