The systems have only cgroup/v2 enabled # mount |egrep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate) Distribution and kernel RedHat 8.7 4.18.0-348.2.1.el8_5.x86_64
-----Original Message----- From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Hermann Schwärzler Sent: Wednesday, July 12, 2023 4:36 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] cgroupv2 + slurmd - external cgroup changes needed to get daemon to start Hi Jenny, I *guess* you have a system that has both cgroup/v1 and cgroup/v2 enabled. Which Linux distribution are you using? And which kernel version? What is the output of mount | grep cgroup What if you do not restrict the cgroup-version Slurm can use to cgroup/v2 but omit "CgroupPlugin=..." from your cgroup.conf? Regards, Hermann On 7/11/23 19:41, Williams, Jenny Avis wrote: > Additional configuration information -- /etc/slurm/cgroup.conf > > CgroupAutomount=yes > > ConstrainCores=yes > > ConstrainRAMSpace=yes > > CgroupPlugin=cgroup/v2 > > AllowedSwapSpace=1 > > ConstrainSwapSpace=yes > > ConstrainDevices=yes > > *From:* Williams, Jenny Avis > *Sent:* Tuesday, July 11, 2023 10:47 AM > *To:* slurm-us...@schedmd.com > *Subject:* cgroupv2 + slurmd - external cgroup changes needed to get > daemon to start > > Progress on getting slurmd to start under cgroupv2 > > Issue: slurmd 22.05.6 will not start when using cgroupv2 > > Expected result: even after reboot slurmd will start up without > needing to manually add lines to /sys/fs/cgroup files. > > When started as service the error is: > > # systemctl status slurmd > > * slurmd.service - Slurm node daemon > > Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; > vendor preset: disabled) > > Drop-In: /etc/systemd/system/slurmd.service.d > > `-extendUnit.conf > > Active: failed (Result: exit-code) since Tue 2023-07-11 10:29:23 > EDT; 2s ago > > Process: 11395 ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS > (code=exited, status=1/FAILURE) > > Main PID: 11395 (code=exited, status=1/FAILURE) > > Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: Started Slurm node > daemon. > > Jul 11 10:29:23 g1803jles01.ll.unc.edu slurmd[11395]: slurmd: slurmd > version 22.05.6 started > > Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service: > Main process exited, code=exited, status=1/FAILURE > > Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service: > Failed with result 'exit-code'. > > When started at the command line the output is: > > # slurmd -D -vvv 2>&1 |egrep error > > slurmd: error: Controller cpuset is not enabled! > > slurmd: error: Controller cpu is not enabled! > > slurmd: error: Controller cpuset is not enabled! > > slurmd: error: Controller cpu is not enabled! > > slurmd: error: Controller cpuset is not enabled! > > slurmd: error: Controller cpu is not enabled! > > slurmd: error: Controller cpuset is not enabled! > > slurmd: error: Controller cpu is not enabled! > > slurmd: error: cpu cgroup controller is not available. > > slurmd: error: There's an issue initializing memory or cpu controller > > slurmd: error: Couldn't load specified plugin name for > jobacct_gather/cgroup: Plugin init() callback failed > > slurmd: error: cannot create jobacct_gather context for > jobacct_gather/cgroup > > Steps to mitigate the issue: > > While the following steps do not solve the issue, they do get the > system in a state such that slurmd will start, at least until next > reboot. The re-install slurm-slurmd is a one-time step to ensure that > local service modifications are out of the picture. */Currently, even > after reboot the cgroup echo steps are necessary at a minimum./* > > #!/bin/bash > > /usr/bin/dnf -y reinstall slurm-slurmd > > systemctl daemon-reload > > /usr/bin/pkill -f '/usr/sbin/slurmstepd infinity' > > systemctl enable slurmd > > systemctl stop dcismeng.service && \ > > *//usr/bin/echo +cpu +cpuset +memory >> > /sys/fs/cgroup/cgroup.subtree_control && \/* > > *//usr/bin/echo +cpu +cpuset +memory >> > /sys/fs/cgroup/system.slice/cgroup.subtree_control && \/* > > systemctl start slurmd && \ > > echo 'run this: systemctl start dcismeng' > > Environment: > > # scontrol show config > > Configuration data as of 2023-07-11T10:39:48 > > AccountingStorageBackupHost = (null) > > AccountingStorageEnforce = associations,limits,qos,safe > > AccountingStorageHost = m1006 > > AccountingStorageExternalHost = (null) > > AccountingStorageParameters = (null) > > AccountingStoragePort = 6819 > > AccountingStorageTRES = > cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu > > AccountingStorageType = accounting_storage/slurmdbd > > AccountingStorageUser = N/A > > AccountingStoreFlags = (null) > > AcctGatherEnergyType = acct_gather_energy/none > > AcctGatherFilesystemType = acct_gather_filesystem/none > > AcctGatherInterconnectType = acct_gather_interconnect/none > > AcctGatherNodeFreq = 0 sec > > AcctGatherProfileType = acct_gather_profile/none > > AllowSpecResourcesUsage = No > > AuthAltTypes = (null) > > AuthAltParameters = (null) > > AuthInfo = (null) > > AuthType = auth/munge > > BatchStartTimeout = 10 sec > > BcastExclude = /lib,/usr/lib,/lib64,/usr/lib64 > > BcastParameters = (null) > > BOOT_TIME = 2023-07-11T10:04:31 > > BurstBufferType = (null) > > CliFilterPlugins = (null) > > ClusterName = ASlurmCluster > > CommunicationParameters = (null) > > CompleteWait = 0 sec > > CoreSpecPlugin = core_spec/none > > CpuFreqDef = Unknown > > CpuFreqGovernors = OnDemand,Performance,UserSpace > > CredType = cred/munge > > DebugFlags = (null) > > DefMemPerNode = UNLIMITED > > DependencyParameters = kill_invalid_depend > > DisableRootJobs = No > > EioTimeout = 60 > > EnforcePartLimits = ANY > > Epilog = (null) > > EpilogMsgTime = 2000 usec > > EpilogSlurmctld = (null) > > ExtSensorsType = ext_sensors/none > > ExtSensorsFreq = 0 sec > > FairShareDampeningFactor = 1 > > FederationParameters = (null) > > FirstJobId = 1 > > GetEnvTimeout = 2 sec > > GresTypes = gpu > > GpuFreqDef = high,memory=high > > GroupUpdateForce = 1 > > GroupUpdateTime = 600 sec > > HASH_VAL = Match > > HealthCheckInterval = 0 sec > > HealthCheckNodeState = ANY > > HealthCheckProgram = (null) > > InactiveLimit = 65533 sec > > InteractiveStepOptions = --interactive --preserve-env --pty $SHELL > > JobAcctGatherFrequency = task=15 > > JobAcctGatherType = jobacct_gather/cgroup > > JobAcctGatherParams = (null) > > JobCompHost = localhost > > JobCompLoc = /var/log/slurm_jobcomp.log > > JobCompPort = 0 > > JobCompType = jobcomp/none > > JobCompUser = root > > JobContainerType = job_container/none > > JobCredentialPrivateKey = (null) > > JobCredentialPublicCertificate = (null) > > JobDefaults = (null) > > JobFileAppend = 0 > > JobRequeue = 1 > > JobSubmitPlugins = lua > > KillOnBadExit = 0 > > KillWait = 30 sec > > LaunchParameters = (null) > > LaunchType = launch/slurm > > Licenses = mplus:1,nonmem:32 > > LogTimeFormat = iso8601_ms > > MailDomain = (null) > > MailProg = /bin/mail > > MaxArraySize = 90001 > > MaxDBDMsgs = 701360 > > MaxJobCount = 350000 > > MaxJobId = 67043328 > > MaxMemPerNode = UNLIMITED > > MaxNodeCount = 340 > > MaxStepCount = 40000 > > MaxTasksPerNode = 512 > > MCSPlugin = mcs/none > > MCSParameters = (null) > > MessageTimeout = 60 sec > > MinJobAge = 300 sec > > MpiDefault = none > > MpiParams = (null) > > NEXT_JOB_ID = 12286313 > > NodeFeaturesPlugins = (null) > > OverTimeLimit = 0 min > > PluginDir = /usr/lib64/slurm > > PlugStackConfig = (null) > > PowerParameters = (null) > > PowerPlugin = > > PreemptMode = OFF > > PreemptType = preempt/none > > PreemptExemptTime = 00:00:00 > > PrEpParameters = (null) > > PrEpPlugins = prep/script > > PriorityParameters = (null) > > PrioritySiteFactorParameters = (null) > > PrioritySiteFactorPlugin = (null) > > PriorityDecayHalfLife = 14-00:00:00 > > PriorityCalcPeriod = 00:05:00 > > PriorityFavorSmall = No > > PriorityFlags = > SMALL_RELATIVE_TO_TIME,CALCULATE_RUNNING,MAX_TRES > > PriorityMaxAge = 60-00:00:00 > > PriorityUsageResetPeriod = NONE > > PriorityType = priority/multifactor > > PriorityWeightAge = 10000 > > PriorityWeightAssoc = 0 > > PriorityWeightFairShare = 10000 > > PriorityWeightJobSize = 1000 > > PriorityWeightPartition = 1000 > > PriorityWeightQOS = 1000 > > PriorityWeightTRES = CPU=1000,Mem=4000,GRES/gpu=3000 > > PrivateData = none > > ProctrackType = proctrack/cgroup > > Prolog = (null) > > PrologEpilogTimeout = 65534 > > PrologSlurmctld = (null) > > PrologFlags = Alloc,Contain,X11 > > PropagatePrioProcess = 0 > > PropagateResourceLimits = ALL > > PropagateResourceLimitsExcept = (null) > > RebootProgram = /usr/sbin/reboot > > ReconfigFlags = (null) > > RequeueExit = (null) > > RequeueExitHold = (null) > > ResumeFailProgram = (null) > > ResumeProgram = (null) > > ResumeRate = 300 nodes/min > > ResumeTimeout = 60 sec > > ResvEpilog = (null) > > ResvOverRun = 0 min > > ResvProlog = (null) > > ReturnToService = 2 > > RoutePlugin = route/default > > SchedulerParameters = > batch_sched_delay=10,bf_continue,bf_max_job_part=1000,bf_max_job_test= > 10000,bf_max_job_user=100,bf_resolution=300,bf_window=10080,bf_yield_i > nterval=1000000,default_queue_depth=1000,partition_job_depth=600,sched > _min_interval=20000000,defer,max_rpc_cnt=80 > > SchedulerTimeSlice = 30 sec > > SchedulerType = sched/backfill > > ScronParameters = (null) > > SelectType = select/cons_tres > > SelectTypeParameters = CR_CPU_MEMORY > > SlurmUser = slurm(47) > > SlurmctldAddr = (null) > > SlurmctldDebug = info > > SlurmctldHost[0] = ASlurmCluster-sched(x.x.x.x) > > SlurmctldLogFile = /data/slurm/slurmctld.log > > SlurmctldPort = 6820-6824 > > SlurmctldSyslogDebug = (null) > > SlurmctldPrimaryOffProg = (null) > > SlurmctldPrimaryOnProg = (null) > > SlurmctldTimeout = 6000 sec > > SlurmctldParameters = (null) > > SlurmdDebug = info > > SlurmdLogFile = /var/log/slurm/slurmd.log > > SlurmdParameters = (null) > > SlurmdPidFile = /var/run/slurmd.pid > > SlurmdPort = 6818 > > SlurmdSpoolDir = /var/spool/slurmd > > SlurmdSyslogDebug = (null) > > SlurmdTimeout = 600 sec > > SlurmdUser = root(0) > > SlurmSchedLogFile = (null) > > SlurmSchedLogLevel = 0 > > SlurmctldPidFile = /var/run/slurmctld.pid > > SlurmctldPlugstack = (null) > > SLURM_CONF = /etc/slurm/slurm.conf > > SLURM_VERSION = 22.05.6 > > SrunEpilog = (null) > > SrunPortRange = 0-0 > > SrunProlog = (null) > > StateSaveLocation = /data/slurm/slurmctld > > SuspendExcNodes = (null) > > SuspendExcParts = (null) > > SuspendProgram = (null) > > SuspendRate = 60 nodes/min > > SuspendTime = INFINITE > > SuspendTimeout = 30 sec > > SwitchParameters = (null) > > SwitchType = switch/none > > TaskEpilog = (null) > > TaskPlugin = cgroup,affinity > > TaskPluginParam = (null type) > > TaskProlog = (null) > > TCPTimeout = 2 sec > > TmpFS = /tmp > > TopologyParam = (null) > > TopologyPlugin = topology/none > > TrackWCKey = No > > TreeWidth = 50 > > UsePam = No > > UnkillableStepProgram = (null) > > UnkillableStepTimeout = 600 sec > > VSizeFactor = 0 percent > > WaitTime = 0 sec > > X11Parameters = home_xauthority > > Cgroup Support Configuration: > > AllowedKmemSpace = (null) > > AllowedRAMSpace = 100.0% > > AllowedSwapSpace = 1.0% > > CgroupAutomount = yes > > CgroupMountpoint = /sys/fs/cgroup > > CgroupPlugin = cgroup/v2 > > ConstrainCores = yes > > ConstrainDevices = yes > > ConstrainKmemSpace = no > > ConstrainRAMSpace = yes > > ConstrainSwapSpace = yes > > IgnoreSystemd = no > > IgnoreSystemdOnFailure = no > > MaxKmemPercent = 100.0% > > MaxRAMPercent = 100.0% > > MaxSwapPercent = 100.0% > > MemorySwappiness = (null) > > MinKmemSpace = 30 MB > > MinRAMSpace = 30 MB > > Slurmctld(primary) at ASlurmCluster-sched is UP >