On 12/05/17 21:57, Jeffrey Hugo wrote:
> On 5/12/2017 2:47 PM, Peter Zijlstra wrote:
>> On Fri, May 12, 2017 at 11:01:37AM -0600, Jeffrey Hugo wrote:
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index d711093..8f783ba 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8219,8 +8219,19 @@ static int load_balance(int this_cpu, struct
>>> rq *this_rq,
>>>             /* All tasks on this runqueue were pinned by CPU affinity */
>>>           if (unlikely(env.flags & LBF_ALL_PINNED)) {
>>> +            struct cpumask tmp;
>>
>> You cannot have cpumask's on stack.
> 
> Well, we need a temp variable to store the intermediate values since the
> cpumask_* operations are somewhat limited, and require a "storage"
> parameter.
> 
> Do you have any suggestions to meet all of these requirements?

What about we use env.dst_grpmask and check if cpus is an improper
subset of env.dst_grpmask? In this case we have to get rid of
setting env.dst_grpmask = NULL in case of CPU_NEWLY_IDLE which is
IMHO not an issue since it's idle is passed via env into
can_migrate_task().
And cpus has to be and'ed with sched_domain_span(env.sd).

I'm not sure if this will work with 'not fully connected NUMA' (SD_OVERLAP)
though ...

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a903276fcb62..2ede4c1c9db8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6737,10 +6737,10 @@ int can_migrate_task(struct task_struct *p, struct 
lb_env *env)
                 * our sched_group. We may want to revisit it if we couldn't
                 * meet load balance goals by pulling other tasks on src_cpu.
                 *
-                * Also avoid computing new_dst_cpu if we have already computed
-                * one in current iteration.
+                * Avoid computing new_dst_cpu for NEWLY_IDLE or if we have
+                * already computed one in current iteration.
                 */
-               if (!env->dst_grpmask || (env->flags & LBF_DST_PINNED))
+               if (env->idle == CPU_NEWLY_IDLE || (env->flags & 
LBF_DST_PINNED))
                        return 0;
 
                /* Prevent to re-select dst_cpu via env's cpus */
@@ -8091,14 +8091,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
                .tasks          = LIST_HEAD_INIT(env.tasks),
        };
 
-       /*
-        * For NEWLY_IDLE load_balancing, we don't need to consider
-        * other cpus in our group
-        */
-       if (idle == CPU_NEWLY_IDLE)
-               env.dst_grpmask = NULL;
-
-       cpumask_copy(cpus, cpu_active_mask);
+       cpumask_and(cpus, cpu_active_mask, sched_domain_span(env.sd));
 
        schedstat_inc(sd->lb_count[idle]);
 
@@ -8220,7 +8213,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
                /* All tasks on this runqueue were pinned by CPU affinity */
                if (unlikely(env.flags & LBF_ALL_PINNED)) {
                        cpumask_clear_cpu(cpu_of(busiest), cpus);
-                       if (!cpumask_empty(cpus)) {
+                       if (!cpumask_subset(cpus, env.dst_grpmask)) {
                                env.loop = 0;
                                env.loop_break = sched_nr_migrate_break;
                                goto redo;

Reply via email to