As reported by sashiko [1], cpuset_update_tasks_nodemask() will do
mpol_rebind_mm() and possibly cpuset_migrate_mm() for all threads of
a multithreaded process. Since commit 3df9ca0a2b8b ("cpuset: migrate
memory only for threadgroup leaders"), cpuset_attach() had been updated
to rebind and migrate memory only for threadgroup leaders to mark the
group leader as the owner of the mm_struct.

To be consistent and avoid unnecessary performance overhead for heavily
multithreaded processes, follow the cpuset_attach() example and perform
memory rebind and migration only for threadgroup leaders.

Also add a paragraph in cgroup-v2.rst under cpuset.mems that the
threadgroup leader is the memory owner of that threadgroup. Therefore
the non-leading threads shouldn't be in other cgroups whose "cpuset.mems"
doesn't fully overlap that of the group leader.

[1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com

Signed-off-by: Waiman Long <[email protected]>
Reviewed-by: Ridong Chen <[email protected]>
---
 Documentation/admin-guide/cgroup-v2.rst | 7 +++++++
 kernel/cgroup/cpuset.c                  | 4 ++++
 2 files changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 993446ab66d0..f9c353174a7e 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2527,6 +2527,13 @@ Cpuset Interface Files
        a need to change "cpuset.mems" with active tasks, it shouldn't
        be done frequently.
 
+       For a multithreaded process, the threadgroup leader is
+       considered the owner of the group's memory. Memory policy
+       rebinding and migration will only happen with respect to the
+       threadgroup leader. To avoid unexpected result, non-leading
+       threads shouldn't be put into another cgroup whose "cpuset.mems"
+       doesn't fully overlap that of the threadgroup leader.
+
   cpuset.mems.effective
        A read-only multiple values file which exists on all
        cpuset-enabled cgroups.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 044ddbf66f8e..055ae54a040a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2673,6 +2673,10 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
 
                cpuset_change_task_nodemask(task, &newmems);
 
+               /* Rebind and migrate mm only for thread group leader */
+               if (!thread_group_leader(task))
+                       continue;
+
                mm = get_task_mm(task);
                if (!mm)
                        continue;
-- 
2.54.0


Reply via email to