Looks good.
Reviewed-by: Pavel Tikhomirov <[email protected]>
On 3/13/26 22:38, Konstantin Khorenko wrote:
> The CFS CPULIMIT hotslice feature (sysctl_sched_vcpu_hotslice) defers
> the tg->nr_cpus_active decrement when a task goes to sleep by arming
> a per-cfs_rq hrtimer instead of decrementing immediately. This avoids
> bouncing the active CPU count for workloads with frequent sleep/wake
> cycles.
>
> The timer is armed in dec_nr_active_cfs_rqs() only when its "postpone"
> argument is non-zero:
>
> /* dequeue_entity(): */
> if (!cfs_rq->load.weight)
> dec_nr_active_cfs_rqs(cfs_rq, flags & DEQUEUE_TASK_SLEEP);
>
> The DEQUEUE_TASK_SLEEP flag (0x10) is distinct from DEQUEUE_SLEEP (0x01)
> on purpose: DEQUEUE_SLEEP is set by __schedule() for the sleeping task
> but is also unconditionally added when walking up the cgroup hierarchy
> (flags |= DEQUEUE_SLEEP in the for_each_sched_entity loop), so parent
> group entities receive it even during migration. DEQUEUE_TASK_SLEEP is
> meant to be set once at the top of dequeue_task_fair() and propagated
> unchanged, so dec_nr_active_cfs_rqs() can distinguish "cfs_rq became
> empty because a task slept" from "cfs_rq became empty as a side-effect
> of migration".
>
> In the original vz7 implementation (commit aebebb312b47 — "sched: Port
> diff-sched-make-nr_cpus-limit-support-hierarchies"), dequeue_task_fair()
> contained:
>
> if (task_sleep)
> flags |= DEQUEUE_TASK_SLEEP;
>
> When the CPULIMIT feature was ported to vz9 (commit 831465734a10 —
> "sched: Port CONFIG_CFS_CPULIMIT feature"), dec_nr_active_cfs_rqs() was
> moved from dequeue_task_fair() into dequeue_entity(), but these three
> lines were not carried over. As a result:
>
> - flags & DEQUEUE_TASK_SLEEP is always 0
> - postpone is always 0
> - the active_timer is never armed
> - sched_cfs_active_timer() never fires
> - sysctl_sched_vcpu_hotslice has no effect
>
> The entire hotslice optimization has been dead code since the vz9 port.
>
> Restore the lost assignment.
>
> Fixes: 831465734a10 ("sched: Port CONFIG_CFS_CPULIMIT feature")
> https://virtuozzo.atlassian.net/browse/VSTOR-126785
>
> Signed-off-by: Konstantin Khorenko <[email protected]>
>
> Feature: sched: ability to limit number of CPUs available to a CT
> ---
> kernel/sched/fair.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8ed4cfa0dc83e..6d0d4457110ff 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6473,6 +6473,9 @@ static void dequeue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
> int idle_h_nr_running = task_has_idle_policy(p);
> bool was_sched_idle = sched_idle_rq(rq);
>
> + if (task_sleep)
> + flags |= DEQUEUE_TASK_SLEEP;
> +
> util_est_dequeue(&rq->cfs, p);
>
> for_each_sched_entity(se) {
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.
_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel