[slurm-users] Re: Slurmd enabled crash with CgroupV2

2024-04-11 Thread Williams, Jenny Avis via slurm-users
The end goal is to see the following 2 things - jobs under the slurmstepd cgroup path, and the cpu,cpuset,memory at least in the cgroup.controllers file within the jobs cgroups.controller list. The pattern you have would be the processes left after boot, first failed slurmd service start which l

[slurm-users] Re: Slurmd enabled crash with CgroupV2

2024-04-11 Thread Josef Dvoracek via slurm-users
thanks for hint. so you end with two "slurmstepd infinity" processes like me when I tried this workaround? [root@node ~]# ps aux | grep slurm root    1833  0.0  0.0  33716  2188 ?    Ss   21:02   0:00 /usr/sbin/slurmstepd infinity root    2259  0.0  0.0 236796 12108 ?    Ss   

[slurm-users] Re: Slurmd enabled crash with CgroupV2

2024-04-11 Thread Williams, Jenny Avis via slurm-users
There needs to be a slurmstepd infinity process running before slurmd starts. This doc goes into it: https://slurm.schedmd.com/cgroup_v2.html Probably a better way to do this, but this is what we do to deal with that: :: files/slurm-cgrepair.service :: [Unit] Before=slurmd

[slurm-users] Re: Slurmd enabled crash with CgroupV2

2024-04-11 Thread Josef Dvoracek via slurm-users
I observe same behavior on slurm 23.11.5 Rocky Linux8.9.. > [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control > memory pids > [root@compute ~]# systemctl disable slurmd > Removed /etc/systemd/system/multi-user.target.wants/slurmd.service. > [root@compute ~]# cat /sys/fs/cgroup/cgroup.su