I re-installed SLURM 22.05.3 and then restarted slurmd and now it's working:
# dnf reinstall slurm slurm-slurmd slurm-devel slurm-pam_slurm # systemctl restart slurmd The dnf.log shows that the versions were the same, so there was no mismatch or anything: 2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-22.05.3-1.el8.x86_64 2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-devel-22.05.3-1.el8.x86_64 2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-pam_slurm-22.05.3-1.el8.x86_64 2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-slurmd-22.05.3-1.el8.x86_64 So I'm not sure what's going on... anyways, at least it's working now! Regards, On Tue, Aug 16, 2022 at 12:53 PM Alan Orth <alan.o...@gmail.com> wrote: > Dear list, > > I've been using cgroupsv2 with SLURM 22.05 on CentOS Stream 8 successfully > for a few months now. Recently a few of my nodes have started having > problems starting slurmd. The log shows: > > [2022-08-16T20:52:58.439] slurmd version 22.05.3 started > [2022-08-16T20:52:58.439] error: Controller cpuset is not enabled! > [2022-08-16T20:52:58.439] error: Controller cpu is not enabled! > [2022-08-16T20:52:58.439] error: cpu cgroup controller is not available. > [2022-08-16T20:52:58.439] error: There's an issue initializing memory or > cpu controller > [2022-08-16T20:52:58.439] error: Couldn't load specified plugin name for > jobacct_gather/cgroup: Plugin init() callback failed > [2022-08-16T20:52:58.439] error: cannot create jobacct_gather context for > jobacct_gather/cgroup > [2022-08-16T20:52:58.439] fatal: Unable to initialize jobacct_gather > > The system has cgroupsv2 enabled as far as I can tell: > > # cat /sys/fs/cgroup/cgroup.controllers > cpuset cpu io memory hugetlb pids rdma > # [ $(stat -fc %T /sys/fs/cgroup/) = "cgroup2fs" ] && echo "unified" || ( > [ -e /sys/fs/cgroup/unified/ ] && echo "hybrid" || echo "legacy") > unified > > And my slurm.conf has: > > ProctrackType=proctrack/cgroup > TaskPlugin=task/affinity,task/cgroup > > And cgroup.conf: > > CgroupAutomount=yes > CgroupPlugin=autodetect > > What else should I look for before giving up and reverting to cgroupsv1? > My current version is 22.05.3, but it was happening in 22.05.2 as well. > > Thank you for any advice. > -- > Alan Orth > alan.o...@gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > -- Alan Orth alan.o...@gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch