The CFS CPULIMIT hotslice feature (sysctl_sched_vcpu_hotslice) defers
the tg->nr_cpus_active decrement when a task goes to sleep by arming
a per-cfs_rq hrtimer instead of decrementing immediately. This avoids
bouncing the active CPU count for workloads with frequent sleep/wake
cycles.
The timer is armed in dec_nr_active_cfs_rqs() only when its "postpone"
argument is non-zero:
/* dequeue_entity(): */
if (!cfs_rq->load.weight)
dec_nr_active_cfs_rqs(cfs_rq, flags & DEQUEUE_TASK_SLEEP);
The DEQUEUE_TASK_SLEEP flag (0x10) is distinct from DEQUEUE_SLEEP (0x01)
on purpose: DEQUEUE_SLEEP is set by __schedule() for the sleeping task
but is also unconditionally added when walking up the cgroup hierarchy
(flags |= DEQUEUE_SLEEP in the for_each_sched_entity loop), so parent
group entities receive it even during migration. DEQUEUE_TASK_SLEEP is
meant to be set once at the top of dequeue_task_fair() and propagated
unchanged, so dec_nr_active_cfs_rqs() can distinguish "cfs_rq became
empty because a task slept" from "cfs_rq became empty as a side-effect
of migration".
In the original vz7 implementation (commit aebebb312b47 — "sched: Port
diff-sched-make-nr_cpus-limit-support-hierarchies"), dequeue_task_fair()
contained:
if (task_sleep)
flags |= DEQUEUE_TASK_SLEEP;
When the CPULIMIT feature was ported to vz9 (commit 831465734a10 —
"sched: Port CONFIG_CFS_CPULIMIT feature"), dec_nr_active_cfs_rqs() was
moved from dequeue_task_fair() into dequeue_entity(), but these three
lines were not carried over. As a result:
- flags & DEQUEUE_TASK_SLEEP is always 0
- postpone is always 0
- the active_timer is never armed
- sched_cfs_active_timer() never fires
- sysctl_sched_vcpu_hotslice has no effect
The entire hotslice optimization has been dead code since the vz9 port.
Restore the lost assignment.
Fixes: 831465734a10 ("sched: Port CONFIG_CFS_CPULIMIT feature")
https://virtuozzo.atlassian.net/browse/VSTOR-126785
Signed-off-by: Konstantin Khorenko <[email protected]>
Feature: sched: ability to limit number of CPUs available to a CT
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8ed4cfa0dc83e..6d0d4457110ff 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6473,6 +6473,9 @@ static void dequeue_task_fair(struct rq *rq, struct
task_struct *p, int flags)
int idle_h_nr_running = task_has_idle_policy(p);
bool was_sched_idle = sched_idle_rq(rq);
+ if (task_sleep)
+ flags |= DEQUEUE_TASK_SLEEP;
+
util_est_dequeue(&rq->cfs, p);
for_each_sched_entity(se) {
--
2.43.0
_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel