The end goal is to see the following 2 things -
jobs under the slurmstepd cgroup path, and
the cpu,cpuset,memory at least in the cgroup.controllers file within the jobs
cgroups.controller list.
The pattern you have would be the processes left after boot, first failed
slurmd service start which l
thanks for hint.
so you end with two "slurmstepd infinity" processes like me when I tried
this workaround?
[root@node ~]# ps aux | grep slurm
root 1833 0.0 0.0 33716 2188 ? Ss 21:02 0:00
/usr/sbin/slurmstepd infinity
root 2259 0.0 0.0 236796 12108 ? Ss
There needs to be a slurmstepd infinity process running before slurmd starts.
This doc goes into it:
https://slurm.schedmd.com/cgroup_v2.html
Probably a better way to do this, but this is what we do to deal with that:
::
files/slurm-cgrepair.service
::
[Unit]
Before=slurmd
I observe same behavior on slurm 23.11.5 Rocky Linux8.9..
> [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control
> memory pids
> [root@compute ~]# systemctl disable slurmd
> Removed /etc/systemd/system/multi-user.target.wants/slurmd.service.
> [root@compute ~]# cat /sys/fs/cgroup/cgroup.su