On 19/03/21 16:19, Vincent Guittot wrote: > On Mon, 15 Mar 2021 at 20:18, Valentin Schneider > <[email protected]> wrote: >> As stated the current behaviour is to classify groups as group_misfit_task >> regardless of the dst_cpu's capacity. When we see a group_misfit_task >> candidate group misfit task with higher per-CPU capacity than the local >> group, we don't pick it as busiest. >> >> I initially thought not marking those as group_misfit_task was the right >> thing to do, as they could then be classified as group_fully_busy or >> group_has_spare. Consider: >> >> DIE [ ] >> MC [ ][ ] >> 0 1 2 3 >> L L B B >> >> arch_scale_capacity(L) < arch_scale_capacity(B) >> >> CPUs 0-1 are idle / lightly loaded >> CPU2 has a misfit task and a few very small tasks >> CPU3 has a few very small tasks >> >> When CPU0 is running load_balance() at DIE level, right now we'll classify >> the [2-3] group as group_misfit_task and not pick it as busiest because the >> local group has a lower CPU capacity. >> >> If we didn't do that, we could leave the misfit task alone and pull some >> small task(s) from CPU2 or CPU3, which would be a good thing to > > Are you sure? the last check in update_sd_pick_busiest() should > already filter this. So it should be enough to let it be classify > correctly > > A group should be classified as group_misfit_task when there is a task > to migrate in priority compared to some other groups. In your case, > you tag it as group_misfit_task but in order to do the opposite, i.e. > make sure to not select it. As mentioned above, this will be filter in > the last check in update_sd_pick_busiest() >
This hinges on sgc->min_capacity, which might be influenced by a CPU in the candidate group being severely pressured by IRQ / thermal / RT / DL pressure. That said, you have a point in that this check and the one in find_busiest_queue() catches most scenarios I can think of. Let me ponder about this some more, and if throw it at the test infrastructure monster if I go down that route.

