The CFS CPULIMIT hotslice feature (sysctl_sched_vcpu_hotslice) defers
the tg->nr_cpus_active decrement when a task goes to sleep by arming
a per-cfs_rq hrtimer instead of decrementing immediately.  This avoids
bouncing the active CPU count for workloads with frequent sleep/wake
cycles.

The timer is armed in dec_nr_active_cfs_rqs() only when its "postpone"
argument is non-zero:

    /* dequeue_entity(): */
    if (!cfs_rq->load.weight)
        dec_nr_active_cfs_rqs(cfs_rq, flags & DEQUEUE_TASK_SLEEP);

The DEQUEUE_TASK_SLEEP flag (0x10) is distinct from DEQUEUE_SLEEP (0x01)
on purpose: DEQUEUE_SLEEP is set by __schedule() for the sleeping task
but is also unconditionally added when walking up the cgroup hierarchy
(flags |= DEQUEUE_SLEEP in the for_each_sched_entity loop), so parent
group entities receive it even during migration.  DEQUEUE_TASK_SLEEP is
meant to be set once at the top of dequeue_task_fair() and propagated
unchanged, so dec_nr_active_cfs_rqs() can distinguish "cfs_rq became
empty because a task slept" from "cfs_rq became empty as a side-effect
of migration".

In the original vz7 implementation (commit aebebb312b47 — "sched: Port
diff-sched-make-nr_cpus-limit-support-hierarchies"), dequeue_task_fair()
contained:

    if (task_sleep)
        flags |= DEQUEUE_TASK_SLEEP;

When the CPULIMIT feature was ported to vz9 (commit 831465734a10 —
"sched: Port CONFIG_CFS_CPULIMIT feature"), dec_nr_active_cfs_rqs() was
moved from dequeue_task_fair() into dequeue_entity(), but these three
lines were not carried over.  As a result:

  - flags & DEQUEUE_TASK_SLEEP is always 0
  - postpone is always 0
  - the active_timer is never armed
  - sched_cfs_active_timer() never fires
  - sysctl_sched_vcpu_hotslice has no effect

The entire hotslice optimization has been dead code since the vz9 port.

Restore the lost assignment.

Fixes: 831465734a10 ("sched: Port CONFIG_CFS_CPULIMIT feature")
https://virtuozzo.atlassian.net/browse/VSTOR-126785

Signed-off-by: Konstantin Khorenko <[email protected]>

Feature: sched: ability to limit number of CPUs available to a CT
---
 kernel/sched/fair.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8ed4cfa0dc83e..6d0d4457110ff 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6473,6 +6473,9 @@ static void dequeue_task_fair(struct rq *rq, struct 
task_struct *p, int flags)
        int idle_h_nr_running = task_has_idle_policy(p);
        bool was_sched_idle = sched_idle_rq(rq);
 
+       if (task_sleep)
+               flags |= DEQUEUE_TASK_SLEEP;
+
        util_est_dequeue(&rq->cfs, p);
 
        for_each_sched_entity(se) {
-- 
2.43.0

_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to