[Devel] [PATCH RHEL10 COMMIT] sched: Add missing cpus_read_lock() in tg_set_cpu_limit()

Konstantin Khorenko Thu, 19 Mar 2026 06:24:28 -0700

The commit is pushed to "branch-rh10-6.12.0-55.52.1.5.x.vz10-ovz" and will 
appear at [email protected]:openvz/vzkernel.git
after rh10-6.12.0-55.52.1.5.10.vz10
------>
commit cc4e435edc28da8953001618c1c13bd2919a6456
Author: Dmitry Sepp <[email protected]>
Date:   Thu Mar 19 09:47:20 2026 +0000


    sched: Add missing cpus_read_lock() in tg_set_cpu_limit()
    
    tg_set_cpu_limit() calls __tg_set_cfs_bandwidth(), which iterates over
    for_each_online_cpu(i)and takes per-CPU rq locks. However,
    tg_set_cpu_limit() does not hold cpus_read_lock().
    
    The requirement to hold cpus_read_lock() was introduced by the upstream
    commit 0e59bdaea75f ("sched/fair: Disable runtime_enabled on dying rq"),
    which changed the iteration in __tg_set_cfs_bandwidth() from
    for_each_possible_cpu to for_each_online_cpu and added
    get_online_cpus()/put_online_cpus() around the call. This was done to
    prevent a raceÂ  between setting cfs_rq->runtime_enabled and
    unthrottle_offline_cfs_rqs().
    
    If a CPU goes offline while __tg_set_cfs_bandwidth() is executing inside
    tg_set_cpu_limit(), the function may re-enable runtime_enabled on a
    dying CPU's cfs_rq after unthrottle_offline_cfs_rqs() has already
    cleared it, leavingÂ tasks stranded on a dead CPU with no way to
    migrate.
    
    The bug was inherited from the original commit
    4514c5835d32f ("sched: Port CONFIG_CFS_CPULIMIT feature"),
    whereÂ tg_set_cpu_limit() was ported from vz7 (kernel 3.10) without
    accounting for the changed locking requirements. In the vz7 kernel,
    __tg_set_cfs_bandwidth() used for_each_possible_cpu, so cpus_read_lock()
    was not needed.
    
    Fixes: 4514c5835d32f ("sched: Port CONFIG_CFS_CPULIMIT feature")
    
    https://virtuozzo.atlassian.net/browse/VSTOR-127251
    
    Signed-off-by: Dmitry Sepp <[email protected]>
    
    ======
    Patchset description:
    sched: Clean up vCPU handling code
    
    The idea behind the change is to transition from the existing spatial
    vCPU handling approach that introduces costly modification to the
    scheduling logic to ensure the requested CPU count is obeyed
    (10%+ performance drop in some tests, see below) to
    temporal isolation that can be provided by the cgroup2 cpu.max.
    
    Reference test results:
    
    1. Clean setup, no vCPU related modifications:
       ~/at_process_ctxswitch_pipe -w -p 2 -t 15
       rate_total: 856509.625000, avg: 428254.812500
    
    2. vCPU related modifications (present state):
       ~/at_process_ctxswitch_pipe -w -p 2 -t 15
       rate_total: 735626.812500, avg: 367813.406250
    
    3. Cleaned-up vCPU handling:
       ~/at_process_ctxswitch_pipe -w -p 2 -t 15
       rate_total: 840074.750000, avg: 420037.375000
    
    Feature: sched: ability to limit number of CPUs available to a CT
---
 kernel/sched/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0423c1b323caf..36cef7e6bfebb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10030,6 +10030,7 @@ static int tg_set_cpu_limit(struct task_group *tg, 
unsigned int nr_cpus)
                quota = max(quota, min_cfs_quota_period);
        }
 
+       cpus_read_lock();
        mutex_lock(&cfs_constraints_mutex);
        ret = __tg_set_cfs_bandwidth(tg, period, quota, burst);
        if (!ret) {
@@ -10037,6 +10038,7 @@ static int tg_set_cpu_limit(struct task_group *tg, 
unsigned int nr_cpus)
                tg->nr_cpus = nr_cpus;
        }
        mutex_unlock(&cfs_constraints_mutex);
+       cpus_read_unlock();
 
        return ret;
 }

_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL10 COMMIT] sched: Add missing cpus_read_lock() in tg_set_cpu_limit()

Reply via email to