The commit is pushed to "branch-rh10-6.12.0-55.52.1.5.x.vz10-ovz" and will
appear at [email protected]:openvz/vzkernel.git
after rh10-6.12.0-55.52.1.5.10.vz10
------>
commit cc4e435edc28da8953001618c1c13bd2919a6456
Author: Dmitry Sepp <[email protected]>
Date: Thu Mar 19 09:47:20 2026 +0000
sched: Add missing cpus_read_lock() in tg_set_cpu_limit()
tg_set_cpu_limit() calls __tg_set_cfs_bandwidth(), which iterates over
for_each_online_cpu(i)and takes per-CPU rq locks. However,
tg_set_cpu_limit() does not hold cpus_read_lock().
The requirement to hold cpus_read_lock() was introduced by the upstream
commit 0e59bdaea75f ("sched/fair: Disable runtime_enabled on dying rq"),
which changed the iteration in __tg_set_cfs_bandwidth() from
for_each_possible_cpu to for_each_online_cpu and added
get_online_cpus()/put_online_cpus() around the call. This was done to
prevent a race between setting cfs_rq->runtime_enabled and
unthrottle_offline_cfs_rqs().
If a CPU goes offline while __tg_set_cfs_bandwidth() is executing inside
tg_set_cpu_limit(), the function may re-enable runtime_enabled on a
dying CPU's cfs_rq after unthrottle_offline_cfs_rqs() has already
cleared it, leaving tasks stranded on a dead CPU with no way to
migrate.
The bug was inherited from the original commit
4514c5835d32f ("sched: Port CONFIG_CFS_CPULIMIT feature"),
where tg_set_cpu_limit() was ported from vz7 (kernel 3.10) without
accounting for the changed locking requirements. In the vz7 kernel,
__tg_set_cfs_bandwidth() used for_each_possible_cpu, so cpus_read_lock()
was not needed.
Fixes: 4514c5835d32f ("sched: Port CONFIG_CFS_CPULIMIT feature")
https://virtuozzo.atlassian.net/browse/VSTOR-127251
Signed-off-by: Dmitry Sepp <[email protected]>
======
Patchset description:
sched: Clean up vCPU handling code
The idea behind the change is to transition from the existing spatial
vCPU handling approach that introduces costly modification to the
scheduling logic to ensure the requested CPU count is obeyed
(10%+ performance drop in some tests, see below) to
temporal isolation that can be provided by the cgroup2 cpu.max.
Reference test results:
1. Clean setup, no vCPU related modifications:
~/at_process_ctxswitch_pipe -w -p 2 -t 15
rate_total: 856509.625000, avg: 428254.812500
2. vCPU related modifications (present state):
~/at_process_ctxswitch_pipe -w -p 2 -t 15
rate_total: 735626.812500, avg: 367813.406250
3. Cleaned-up vCPU handling:
~/at_process_ctxswitch_pipe -w -p 2 -t 15
rate_total: 840074.750000, avg: 420037.375000
Feature: sched: ability to limit number of CPUs available to a CT
---
kernel/sched/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0423c1b323caf..36cef7e6bfebb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10030,6 +10030,7 @@ static int tg_set_cpu_limit(struct task_group *tg,
unsigned int nr_cpus)
quota = max(quota, min_cfs_quota_period);
}
+ cpus_read_lock();
mutex_lock(&cfs_constraints_mutex);
ret = __tg_set_cfs_bandwidth(tg, period, quota, burst);
if (!ret) {
@@ -10037,6 +10038,7 @@ static int tg_set_cpu_limit(struct task_group *tg,
unsigned int nr_cpus)
tg->nr_cpus = nr_cpus;
}
mutex_unlock(&cfs_constraints_mutex);
+ cpus_read_unlock();
return ret;
}
_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel