On 21/09/20 08:24, Vincent Guittot wrote: > Some UCs like 9 always running tasks on 8 CPUs can't be balanced and the > load balancer currently migrates the waiting task between the CPUs in an > almost random manner. The success of a rq pulling a task depends of the > value of nr_balance_failed of its domains and its ability to be faster > than others to detach it. This behavior results in an unfair distribution > of the running time between tasks because some CPUs will run most of the > time, if not always, the same task whereas others will share their time > between several tasks. > > Instead of using nr_balance_failed as a boolean to relax the condition > for detaching task, the LB will use nr_balanced_failed to relax the > threshold between the tasks'load and the imbalance. This mecanism > prevents the same rq or domain to always win the load balance fight. > > Reviewed-by: Phil Auld <[email protected]> > Signed-off-by: Vincent Guittot <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>

