Additional configuration information -- /etc/slurm/cgroup.conf CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes CgroupPlugin=cgroup/v2 AllowedSwapSpace=1 ConstrainSwapSpace=yes ConstrainDevices=yes
From: Williams, Jenny Avis Sent: Tuesday, July 11, 2023 10:47 AM To: slurm-us...@schedmd.com Subject: cgroupv2 + slurmd - external cgroup changes needed to get daemon to start Progress on getting slurmd to start under cgroupv2 Issue: slurmd 22.05.6 will not start when using cgroupv2 Expected result: even after reboot slurmd will start up without needing to manually add lines to /sys/fs/cgroup files. When started as service the error is: # systemctl status slurmd * slurmd.service - Slurm node daemon Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/slurmd.service.d `-extendUnit.conf Active: failed (Result: exit-code) since Tue 2023-07-11 10:29:23 EDT; 2s ago Process: 11395 ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS (code=exited, status=1/FAILURE) Main PID: 11395 (code=exited, status=1/FAILURE) Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: Started Slurm node daemon. Jul 11 10:29:23 g1803jles01.ll.unc.edu slurmd[11395]: slurmd: slurmd version 22.05.6 started Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service: Main process exited, code=exited, status=1/FAILURE Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service: Failed with result 'exit-code'. When started at the command line the output is: # slurmd -D -vvv 2>&1 |egrep error slurmd: error: Controller cpuset is not enabled! slurmd: error: Controller cpu is not enabled! slurmd: error: Controller cpuset is not enabled! slurmd: error: Controller cpu is not enabled! slurmd: error: Controller cpuset is not enabled! slurmd: error: Controller cpu is not enabled! slurmd: error: Controller cpuset is not enabled! slurmd: error: Controller cpu is not enabled! slurmd: error: cpu cgroup controller is not available. slurmd: error: There's an issue initializing memory or cpu controller slurmd: error: Couldn't load specified plugin name for jobacct_gather/cgroup: Plugin init() callback failed slurmd: error: cannot create jobacct_gather context for jobacct_gather/cgroup Steps to mitigate the issue: While the following steps do not solve the issue, they do get the system in a state such that slurmd will start, at least until next reboot. The re-install slurm-slurmd is a one-time step to ensure that local service modifications are out of the picture. Currently, even after reboot the cgroup echo steps are necessary at a minimum. #!/bin/bash /usr/bin/dnf -y reinstall slurm-slurmd systemctl daemon-reload /usr/bin/pkill -f '/usr/sbin/slurmstepd infinity' systemctl enable slurmd systemctl stop dcismeng.service && \ /usr/bin/echo +cpu +cpuset +memory >> /sys/fs/cgroup/cgroup.subtree_control && \ /usr/bin/echo +cpu +cpuset +memory >> /sys/fs/cgroup/system.slice/cgroup.subtree_control && \ systemctl start slurmd && \ echo 'run this: systemctl start dcismeng' Environment: # scontrol show config Configuration data as of 2023-07-11T10:39:48 AccountingStorageBackupHost = (null) AccountingStorageEnforce = associations,limits,qos,safe AccountingStorageHost = m1006 AccountingStorageExternalHost = (null) AccountingStorageParameters = (null) AccountingStoragePort = 6819 AccountingStorageTRES = cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu AccountingStorageType = accounting_storage/slurmdbd AccountingStorageUser = N/A AccountingStoreFlags = (null) AcctGatherEnergyType = acct_gather_energy/none AcctGatherFilesystemType = acct_gather_filesystem/none AcctGatherInterconnectType = acct_gather_interconnect/none AcctGatherNodeFreq = 0 sec AcctGatherProfileType = acct_gather_profile/none AllowSpecResourcesUsage = No AuthAltTypes = (null) AuthAltParameters = (null) AuthInfo = (null) AuthType = auth/munge BatchStartTimeout = 10 sec BcastExclude = /lib,/usr/lib,/lib64,/usr/lib64 BcastParameters = (null) BOOT_TIME = 2023-07-11T10:04:31 BurstBufferType = (null) CliFilterPlugins = (null) ClusterName = ASlurmCluster CommunicationParameters = (null) CompleteWait = 0 sec CoreSpecPlugin = core_spec/none CpuFreqDef = Unknown CpuFreqGovernors = OnDemand,Performance,UserSpace CredType = cred/munge DebugFlags = (null) DefMemPerNode = UNLIMITED DependencyParameters = kill_invalid_depend DisableRootJobs = No EioTimeout = 60 EnforcePartLimits = ANY Epilog = (null) EpilogMsgTime = 2000 usec EpilogSlurmctld = (null) ExtSensorsType = ext_sensors/none ExtSensorsFreq = 0 sec FairShareDampeningFactor = 1 FederationParameters = (null) FirstJobId = 1 GetEnvTimeout = 2 sec GresTypes = gpu GpuFreqDef = high,memory=high GroupUpdateForce = 1 GroupUpdateTime = 600 sec HASH_VAL = Match HealthCheckInterval = 0 sec HealthCheckNodeState = ANY HealthCheckProgram = (null) InactiveLimit = 65533 sec InteractiveStepOptions = --interactive --preserve-env --pty $SHELL JobAcctGatherFrequency = task=15 JobAcctGatherType = jobacct_gather/cgroup JobAcctGatherParams = (null) JobCompHost = localhost JobCompLoc = /var/log/slurm_jobcomp.log JobCompPort = 0 JobCompType = jobcomp/none JobCompUser = root JobContainerType = job_container/none JobCredentialPrivateKey = (null) JobCredentialPublicCertificate = (null) JobDefaults = (null) JobFileAppend = 0 JobRequeue = 1 JobSubmitPlugins = lua KillOnBadExit = 0 KillWait = 30 sec LaunchParameters = (null) LaunchType = launch/slurm Licenses = mplus:1,nonmem:32 LogTimeFormat = iso8601_ms MailDomain = (null) MailProg = /bin/mail MaxArraySize = 90001 MaxDBDMsgs = 701360 MaxJobCount = 350000 MaxJobId = 67043328 MaxMemPerNode = UNLIMITED MaxNodeCount = 340 MaxStepCount = 40000 MaxTasksPerNode = 512 MCSPlugin = mcs/none MCSParameters = (null) MessageTimeout = 60 sec MinJobAge = 300 sec MpiDefault = none MpiParams = (null) NEXT_JOB_ID = 12286313 NodeFeaturesPlugins = (null) OverTimeLimit = 0 min PluginDir = /usr/lib64/slurm PlugStackConfig = (null) PowerParameters = (null) PowerPlugin = PreemptMode = OFF PreemptType = preempt/none PreemptExemptTime = 00:00:00 PrEpParameters = (null) PrEpPlugins = prep/script PriorityParameters = (null) PrioritySiteFactorParameters = (null) PrioritySiteFactorPlugin = (null) PriorityDecayHalfLife = 14-00:00:00 PriorityCalcPeriod = 00:05:00 PriorityFavorSmall = No PriorityFlags = SMALL_RELATIVE_TO_TIME,CALCULATE_RUNNING,MAX_TRES PriorityMaxAge = 60-00:00:00 PriorityUsageResetPeriod = NONE PriorityType = priority/multifactor PriorityWeightAge = 10000 PriorityWeightAssoc = 0 PriorityWeightFairShare = 10000 PriorityWeightJobSize = 1000 PriorityWeightPartition = 1000 PriorityWeightQOS = 1000 PriorityWeightTRES = CPU=1000,Mem=4000,GRES/gpu=3000 PrivateData = none ProctrackType = proctrack/cgroup Prolog = (null) PrologEpilogTimeout = 65534 PrologSlurmctld = (null) PrologFlags = Alloc,Contain,X11 PropagatePrioProcess = 0 PropagateResourceLimits = ALL PropagateResourceLimitsExcept = (null) RebootProgram = /usr/sbin/reboot ReconfigFlags = (null) RequeueExit = (null) RequeueExitHold = (null) ResumeFailProgram = (null) ResumeProgram = (null) ResumeRate = 300 nodes/min ResumeTimeout = 60 sec ResvEpilog = (null) ResvOverRun = 0 min ResvProlog = (null) ReturnToService = 2 RoutePlugin = route/default SchedulerParameters = batch_sched_delay=10,bf_continue,bf_max_job_part=1000,bf_max_job_test=10000,bf_max_job_user=100,bf_resolution=300,bf_window=10080,bf_yield_interval=1000000,default_queue_depth=1000,partition_job_depth=600,sched_min_interval=20000000,defer,max_rpc_cnt=80 SchedulerTimeSlice = 30 sec SchedulerType = sched/backfill ScronParameters = (null) SelectType = select/cons_tres SelectTypeParameters = CR_CPU_MEMORY SlurmUser = slurm(47) SlurmctldAddr = (null) SlurmctldDebug = info SlurmctldHost[0] = ASlurmCluster-sched(x.x.x.x) SlurmctldLogFile = /data/slurm/slurmctld.log SlurmctldPort = 6820-6824 SlurmctldSyslogDebug = (null) SlurmctldPrimaryOffProg = (null) SlurmctldPrimaryOnProg = (null) SlurmctldTimeout = 6000 sec SlurmctldParameters = (null) SlurmdDebug = info SlurmdLogFile = /var/log/slurm/slurmd.log SlurmdParameters = (null) SlurmdPidFile = /var/run/slurmd.pid SlurmdPort = 6818 SlurmdSpoolDir = /var/spool/slurmd SlurmdSyslogDebug = (null) SlurmdTimeout = 600 sec SlurmdUser = root(0) SlurmSchedLogFile = (null) SlurmSchedLogLevel = 0 SlurmctldPidFile = /var/run/slurmctld.pid SlurmctldPlugstack = (null) SLURM_CONF = /etc/slurm/slurm.conf SLURM_VERSION = 22.05.6 SrunEpilog = (null) SrunPortRange = 0-0 SrunProlog = (null) StateSaveLocation = /data/slurm/slurmctld SuspendExcNodes = (null) SuspendExcParts = (null) SuspendProgram = (null) SuspendRate = 60 nodes/min SuspendTime = INFINITE SuspendTimeout = 30 sec SwitchParameters = (null) SwitchType = switch/none TaskEpilog = (null) TaskPlugin = cgroup,affinity TaskPluginParam = (null type) TaskProlog = (null) TCPTimeout = 2 sec TmpFS = /tmp TopologyParam = (null) TopologyPlugin = topology/none TrackWCKey = No TreeWidth = 50 UsePam = No UnkillableStepProgram = (null) UnkillableStepTimeout = 600 sec VSizeFactor = 0 percent WaitTime = 0 sec X11Parameters = home_xauthority Cgroup Support Configuration: AllowedKmemSpace = (null) AllowedRAMSpace = 100.0% AllowedSwapSpace = 1.0% CgroupAutomount = yes CgroupMountpoint = /sys/fs/cgroup CgroupPlugin = cgroup/v2 ConstrainCores = yes ConstrainDevices = yes ConstrainKmemSpace = no ConstrainRAMSpace = yes ConstrainSwapSpace = yes IgnoreSystemd = no IgnoreSystemdOnFailure = no MaxKmemPercent = 100.0% MaxRAMPercent = 100.0% MaxSwapPercent = 100.0% MemorySwappiness = (null) MinKmemSpace = 30 MB MinRAMSpace = 30 MB Slurmctld(primary) at ASlurmCluster-sched is UP