As reported by sashiko [1], cpuset_update_tasks_nodemask() will do
mpol_rebind_mm() and possibly cpuset_migrate_mm() for all threads of
a multithreaded process. Since commit 3df9ca0a2b8b ("cpuset: migrate
memory only for threadgroup leaders"), cpuset_attach() had been updated
to rebind and migrate memory only for threadgroup leaders to mark the
group leader as the owner of the mm_struct.To be consistent and avoid unnecessary performance overhead for heavily multithreaded processes, follow the cpuset_attach() example and perform memory rebind and migration only for threadgroup leaders. Also add a paragraph in cgroup-v2.rst under cpuset.mems that the threadgroup leader is the memory owner of that threadgroup. Therefore the non-leading threads shouldn't be in other cgroups whose "cpuset.mems" doesn't fully overlap that of the group leader. [1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com Signed-off-by: Waiman Long <[email protected]> Reviewed-by: Ridong Chen <[email protected]> --- Documentation/admin-guide/cgroup-v2.rst | 7 +++++++ kernel/cgroup/cpuset.c | 4 ++++ 2 files changed, 11 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 993446ab66d0..f9c353174a7e 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2527,6 +2527,13 @@ Cpuset Interface Files a need to change "cpuset.mems" with active tasks, it shouldn't be done frequently. + For a multithreaded process, the threadgroup leader is + considered the owner of the group's memory. Memory policy + rebinding and migration will only happen with respect to the + threadgroup leader. To avoid unexpected result, non-leading + threads shouldn't be put into another cgroup whose "cpuset.mems" + doesn't fully overlap that of the threadgroup leader. + cpuset.mems.effective A read-only multiple values file which exists on all cpuset-enabled cgroups. diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 044ddbf66f8e..055ae54a040a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2673,6 +2673,10 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs) cpuset_change_task_nodemask(task, &newmems); + /* Rebind and migrate mm only for thread group leader */ + if (!thread_group_leader(task)) + continue; + mm = get_task_mm(task); if (!mm) continue; -- 2.54.0

