For what it's worth I've rolled back to cgroups v1 on CentOS Stream 8. I
will be watching future SLURM release notes carefully to see if anything
changes here, as well as to see people's experiences here on the list.

Regards,



On Wed, Aug 17, 2022 at 12:36 AM Alan Orth <alan.o...@gmail.com> wrote:

> Thanks for the advice. I checked munge's log on the system that was most
> recently affected and found a few hundred of these:
>
> 2022-08-16 23:30:56 +0300 Info:      Unauthorized credential for client
> UID=0 GID=0
>
> Not sure if relevant. NTP on the system is synced. I'll keep an eye on
> munge in the future...
>
> Thanks again,
>
> On Tue, Aug 16, 2022 at 1:45 PM Timony, Mick <
> michael_tim...@hms.harvard.edu> wrote:
>
>> When I see odd behaviour I've found it sometimes related to either NTP
>> issues (the time is off) or munge errors:
>>
>>    - Is NTP running and is the time accurate
>>    - Look for munge errors:
>>       - /var/log/munge/munged.log
>>       - sudo systemctl status munge
>>
>> If it's a munge error, usually restarting munge does the trick:
>>
>> sudo systemctl restart munge
>>
>> Regards
>> --Mick
>> ------------------------------
>> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of
>> Alan Orth <alan.o...@gmail.com>
>> *Sent:* Tuesday, August 16, 2022 4:36 PM
>> *To:* Slurm User Community List <slurm-users@lists.schedmd.com>
>> *Subject:* Re: [slurm-users] Problems with cgroupsv2
>>
>> I re-installed SLURM 22.05.3 and then restarted slurmd and now it's
>> working:
>>
>> # dnf reinstall slurm slurm-slurmd slurm-devel slurm-pam_slurm
>> # systemctl restart slurmd
>>
>> The dnf.log shows that the versions were the same, so there was no
>> mismatch or anything:
>>
>> 2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-22.05.3-1.el8.x86_64
>> 2022-08-16T23:29:02+0300 DEBUG Reinstalled:
>> slurm-devel-22.05.3-1.el8.x86_64
>> 2022-08-16T23:29:02+0300 DEBUG Reinstalled:
>> slurm-pam_slurm-22.05.3-1.el8.x86_64
>> 2022-08-16T23:29:02+0300 DEBUG Reinstalled:
>> slurm-slurmd-22.05.3-1.el8.x86_64
>>
>> So I'm not sure what's going on... anyways, at least it's working now!
>>
>> Regards,
>>
>> On Tue, Aug 16, 2022 at 12:53 PM Alan Orth <alan.o...@gmail.com> wrote:
>>
>> Dear list,
>>
>> I've been using cgroupsv2 with SLURM 22.05 on CentOS Stream 8
>> successfully for a few months now. Recently a few of my nodes have started
>> having problems starting slurmd. The log shows:
>>
>> [2022-08-16T20:52:58.439] slurmd version 22.05.3 started
>> [2022-08-16T20:52:58.439] error: Controller cpuset is not enabled!
>> [2022-08-16T20:52:58.439] error: Controller cpu is not enabled!
>> [2022-08-16T20:52:58.439] error: cpu cgroup controller is not available.
>> [2022-08-16T20:52:58.439] error: There's an issue initializing memory or
>> cpu controller
>> [2022-08-16T20:52:58.439] error: Couldn't load specified plugin name for
>> jobacct_gather/cgroup: Plugin init() callback failed
>> [2022-08-16T20:52:58.439] error: cannot create jobacct_gather context for
>> jobacct_gather/cgroup
>> [2022-08-16T20:52:58.439] fatal: Unable to initialize jobacct_gather
>>
>> The system has cgroupsv2 enabled as far as I can tell:
>>
>> # cat /sys/fs/cgroup/cgroup.controllers
>> cpuset cpu io memory hugetlb pids rdma
>> # [ $(stat -fc %T /sys/fs/cgroup/) = "cgroup2fs" ] && echo "unified" || (
>> [ -e /sys/fs/cgroup/unified/ ] && echo "hybrid" || echo "legacy")
>> unified
>>
>> And my slurm.conf has:
>>
>> ProctrackType=proctrack/cgroup
>> TaskPlugin=task/affinity,task/cgroup
>>
>> And cgroup.conf:
>>
>> CgroupAutomount=yes
>> CgroupPlugin=autodetect
>>
>> What else should I look for before giving up and reverting to cgroupsv1?
>> My current version is 22.05.3, but it was happening in 22.05.2 as well.
>>
>> Thank you for any advice.
>> --
>> Alan Orth
>> alan.o...@gmail.com
>> https://picturingjordan.com
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__picturingjordan.com&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=Crq2NCkLF76f5LeQhObq0JdnDo_EKcfYlXcq0iyqQvQ&e=>
>> https://englishbulgaria.net
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__englishbulgaria.net&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=K9dvD9QmS3EWZctC_BnTaz7zdTgF_t3qdDwOtYyCHL8&e=>
>> https://mjanja.ch
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mjanja.ch&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=D9vI36K8ewQZH9ZIUAAnhRMAJJNdjfbCE9WI-5KuJuU&e=>
>>
>>
>>
>> --
>> Alan Orth
>> alan.o...@gmail.com
>> https://picturingjordan.com
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__picturingjordan.com&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=Crq2NCkLF76f5LeQhObq0JdnDo_EKcfYlXcq0iyqQvQ&e=>
>> https://englishbulgaria.net
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__englishbulgaria.net&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=K9dvD9QmS3EWZctC_BnTaz7zdTgF_t3qdDwOtYyCHL8&e=>
>> https://mjanja.ch
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mjanja.ch&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=D9vI36K8ewQZH9ZIUAAnhRMAJJNdjfbCE9WI-5KuJuU&e=>
>>
>
>
> --
> Alan Orth
> alan.o...@gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
>


-- 
Alan Orth
alan.o...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

Reply via email to