Looks good.

Reviewed-by: Pavel Tikhomirov <[email protected]>

On 3/13/26 22:38, Konstantin Khorenko wrote:
> The CFS CPULIMIT hotslice feature (sysctl_sched_vcpu_hotslice) defers
> the tg->nr_cpus_active decrement when a task goes to sleep by arming
> a per-cfs_rq hrtimer instead of decrementing immediately.  This avoids
> bouncing the active CPU count for workloads with frequent sleep/wake
> cycles.
> 
> The timer is armed in dec_nr_active_cfs_rqs() only when its "postpone"
> argument is non-zero:
> 
>     /* dequeue_entity(): */
>     if (!cfs_rq->load.weight)
>         dec_nr_active_cfs_rqs(cfs_rq, flags & DEQUEUE_TASK_SLEEP);
> 
> The DEQUEUE_TASK_SLEEP flag (0x10) is distinct from DEQUEUE_SLEEP (0x01)
> on purpose: DEQUEUE_SLEEP is set by __schedule() for the sleeping task
> but is also unconditionally added when walking up the cgroup hierarchy
> (flags |= DEQUEUE_SLEEP in the for_each_sched_entity loop), so parent
> group entities receive it even during migration.  DEQUEUE_TASK_SLEEP is
> meant to be set once at the top of dequeue_task_fair() and propagated
> unchanged, so dec_nr_active_cfs_rqs() can distinguish "cfs_rq became
> empty because a task slept" from "cfs_rq became empty as a side-effect
> of migration".
> 
> In the original vz7 implementation (commit aebebb312b47 — "sched: Port
> diff-sched-make-nr_cpus-limit-support-hierarchies"), dequeue_task_fair()
> contained:
> 
>     if (task_sleep)
>         flags |= DEQUEUE_TASK_SLEEP;
> 
> When the CPULIMIT feature was ported to vz9 (commit 831465734a10 —
> "sched: Port CONFIG_CFS_CPULIMIT feature"), dec_nr_active_cfs_rqs() was
> moved from dequeue_task_fair() into dequeue_entity(), but these three
> lines were not carried over.  As a result:
> 
>   - flags & DEQUEUE_TASK_SLEEP is always 0
>   - postpone is always 0
>   - the active_timer is never armed
>   - sched_cfs_active_timer() never fires
>   - sysctl_sched_vcpu_hotslice has no effect
> 
> The entire hotslice optimization has been dead code since the vz9 port.
> 
> Restore the lost assignment.
> 
> Fixes: 831465734a10 ("sched: Port CONFIG_CFS_CPULIMIT feature")
> https://virtuozzo.atlassian.net/browse/VSTOR-126785
> 
> Signed-off-by: Konstantin Khorenko <[email protected]>
> 
> Feature: sched: ability to limit number of CPUs available to a CT
> ---
>  kernel/sched/fair.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8ed4cfa0dc83e..6d0d4457110ff 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6473,6 +6473,9 @@ static void dequeue_task_fair(struct rq *rq, struct 
> task_struct *p, int flags)
>       int idle_h_nr_running = task_has_idle_policy(p);
>       bool was_sched_idle = sched_idle_rq(rq);
>  
> +     if (task_sleep)
> +             flags |= DEQUEUE_TASK_SLEEP;
> +
>       util_est_dequeue(&rq->cfs, p);
>  
>       for_each_sched_entity(se) {

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.

_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to