Hi Jenny,

I *guess* you have a system that has both cgroup/v1 and cgroup/v2 enabled.

Which Linux distribution are you using? And which kernel version?
What is the output of
  mount | grep cgroup
What if you do not restrict the cgroup-version Slurm can use to cgroup/v2 but omit "CgroupPlugin=..." from your cgroup.conf?

Regards,
Hermann

On 7/11/23 19:41, Williams, Jenny Avis wrote:
Additional configuration information -- /etc/slurm/cgroup.conf

CgroupAutomount=yes

ConstrainCores=yes

ConstrainRAMSpace=yes

CgroupPlugin=cgroup/v2

AllowedSwapSpace=1

ConstrainSwapSpace=yes

ConstrainDevices=yes

*From:* Williams, Jenny Avis
*Sent:* Tuesday, July 11, 2023 10:47 AM
*To:* slurm-us...@schedmd.com
*Subject:* cgroupv2 + slurmd - external cgroup changes needed to get daemon to start

Progress on getting slurmd to start under cgroupv2

Issue: slurmd 22.05.6 will not start when using cgroupv2

Expected result: even after reboot slurmd will start up without needing to manually add lines to /sys/fs/cgroup files.

When started as service the error is:

# systemctl status slurmd

* slurmd.service - Slurm node daemon

   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)

   Drop-In: /etc/systemd/system/slurmd.service.d

            `-extendUnit.conf

   Active: failed (Result: exit-code) since Tue 2023-07-11 10:29:23 EDT; 2s ago

  Process: 11395 ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS (code=exited, status=1/FAILURE)

Main PID: 11395 (code=exited, status=1/FAILURE)

Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: Started Slurm node daemon.

Jul 11 10:29:23 g1803jles01.ll.unc.edu slurmd[11395]: slurmd: slurmd version 22.05.6 started

Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service: Main process exited, code=exited, status=1/FAILURE

Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service: Failed with result 'exit-code'.

When started at the command line the output is:

# slurmd -D -vvv 2>&1 |egrep error

slurmd: error: Controller cpuset is not enabled!

slurmd: error: Controller cpu is not enabled!

slurmd: error: Controller cpuset is not enabled!

slurmd: error: Controller cpu is not enabled!

slurmd: error: Controller cpuset is not enabled!

slurmd: error: Controller cpu is not enabled!

slurmd: error: Controller cpuset is not enabled!

slurmd: error: Controller cpu is not enabled!

slurmd: error: cpu cgroup controller is not available.

slurmd: error: There's an issue initializing memory or cpu controller

slurmd: error: Couldn't load specified plugin name for jobacct_gather/cgroup: Plugin init() callback failed

slurmd: error: cannot create jobacct_gather context for jobacct_gather/cgroup

Steps to mitigate the issue:

While the following steps do not solve the issue, they do get the system in a state such that slurmd will start, at least until next reboot.  The re-install slurm-slurmd is a one-time step to ensure that local service modifications are out of the picture. */Currently, even after reboot the cgroup echo steps are necessary at a minimum./*

#!/bin/bash

/usr/bin/dnf -y reinstall slurm-slurmd

systemctl daemon-reload

/usr/bin/pkill -f '/usr/sbin/slurmstepd infinity'

systemctl enable slurmd

systemctl stop dcismeng.service && \

*//usr/bin/echo +cpu +cpuset +memory >> /sys/fs/cgroup/cgroup.subtree_control && \/*

*//usr/bin/echo +cpu +cpuset +memory >> /sys/fs/cgroup/system.slice/cgroup.subtree_control && \/*

systemctl start slurmd && \

  echo 'run this: systemctl start dcismeng'

Environment:

# scontrol show config

Configuration data as of 2023-07-11T10:39:48

AccountingStorageBackupHost = (null)

AccountingStorageEnforce = associations,limits,qos,safe

AccountingStorageHost   = m1006

AccountingStorageExternalHost = (null)

AccountingStorageParameters = (null)

AccountingStoragePort   = 6819

AccountingStorageTRES   = cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu

AccountingStorageType   = accounting_storage/slurmdbd

AccountingStorageUser   = N/A

AccountingStoreFlags    = (null)

AcctGatherEnergyType    = acct_gather_energy/none

AcctGatherFilesystemType = acct_gather_filesystem/none

AcctGatherInterconnectType = acct_gather_interconnect/none

AcctGatherNodeFreq      = 0 sec

AcctGatherProfileType   = acct_gather_profile/none

AllowSpecResourcesUsage = No

AuthAltTypes            = (null)

AuthAltParameters       = (null)

AuthInfo                = (null)

AuthType                = auth/munge

BatchStartTimeout       = 10 sec

BcastExclude            = /lib,/usr/lib,/lib64,/usr/lib64

BcastParameters         = (null)

BOOT_TIME               = 2023-07-11T10:04:31

BurstBufferType         = (null)

CliFilterPlugins        = (null)

ClusterName             = ASlurmCluster

CommunicationParameters = (null)

CompleteWait            = 0 sec

CoreSpecPlugin          = core_spec/none

CpuFreqDef              = Unknown

CpuFreqGovernors        = OnDemand,Performance,UserSpace

CredType                = cred/munge

DebugFlags              = (null)

DefMemPerNode           = UNLIMITED

DependencyParameters    = kill_invalid_depend

DisableRootJobs         = No

EioTimeout              = 60

EnforcePartLimits       = ANY

Epilog                  = (null)

EpilogMsgTime           = 2000 usec

EpilogSlurmctld         = (null)

ExtSensorsType          = ext_sensors/none

ExtSensorsFreq          = 0 sec

FairShareDampeningFactor = 1

FederationParameters    = (null)

FirstJobId              = 1

GetEnvTimeout           = 2 sec

GresTypes               = gpu

GpuFreqDef              = high,memory=high

GroupUpdateForce        = 1

GroupUpdateTime         = 600 sec

HASH_VAL                = Match

HealthCheckInterval     = 0 sec

HealthCheckNodeState    = ANY

HealthCheckProgram      = (null)

InactiveLimit           = 65533 sec

InteractiveStepOptions  = --interactive --preserve-env --pty $SHELL

JobAcctGatherFrequency  = task=15

JobAcctGatherType       = jobacct_gather/cgroup

JobAcctGatherParams     = (null)

JobCompHost             = localhost

JobCompLoc              = /var/log/slurm_jobcomp.log

JobCompPort             = 0

JobCompType             = jobcomp/none

JobCompUser             = root

JobContainerType        = job_container/none

JobCredentialPrivateKey = (null)

JobCredentialPublicCertificate = (null)

JobDefaults             = (null)

JobFileAppend           = 0

JobRequeue              = 1

JobSubmitPlugins        = lua

KillOnBadExit           = 0

KillWait                = 30 sec

LaunchParameters        = (null)

LaunchType              = launch/slurm

Licenses                = mplus:1,nonmem:32

LogTimeFormat           = iso8601_ms

MailDomain              = (null)

MailProg                = /bin/mail

MaxArraySize            = 90001

MaxDBDMsgs              = 701360

MaxJobCount             = 350000

MaxJobId                = 67043328

MaxMemPerNode           = UNLIMITED

MaxNodeCount            = 340

MaxStepCount            = 40000

MaxTasksPerNode         = 512

MCSPlugin               = mcs/none

MCSParameters           = (null)

MessageTimeout          = 60 sec

MinJobAge               = 300 sec

MpiDefault              = none

MpiParams               = (null)

NEXT_JOB_ID             = 12286313

NodeFeaturesPlugins     = (null)

OverTimeLimit           = 0 min

PluginDir               = /usr/lib64/slurm

PlugStackConfig         = (null)

PowerParameters         = (null)

PowerPlugin             =

PreemptMode             = OFF

PreemptType             = preempt/none

PreemptExemptTime       = 00:00:00

PrEpParameters          = (null)

PrEpPlugins             = prep/script

PriorityParameters      = (null)

PrioritySiteFactorParameters = (null)

PrioritySiteFactorPlugin = (null)

PriorityDecayHalfLife   = 14-00:00:00

PriorityCalcPeriod      = 00:05:00

PriorityFavorSmall      = No

PriorityFlags           = SMALL_RELATIVE_TO_TIME,CALCULATE_RUNNING,MAX_TRES

PriorityMaxAge          = 60-00:00:00

PriorityUsageResetPeriod = NONE

PriorityType            = priority/multifactor

PriorityWeightAge       = 10000

PriorityWeightAssoc     = 0

PriorityWeightFairShare = 10000

PriorityWeightJobSize   = 1000

PriorityWeightPartition = 1000

PriorityWeightQOS       = 1000

PriorityWeightTRES      = CPU=1000,Mem=4000,GRES/gpu=3000

PrivateData             = none

ProctrackType           = proctrack/cgroup

Prolog                  = (null)

PrologEpilogTimeout     = 65534

PrologSlurmctld         = (null)

PrologFlags             = Alloc,Contain,X11

PropagatePrioProcess    = 0

PropagateResourceLimits = ALL

PropagateResourceLimitsExcept = (null)

RebootProgram           = /usr/sbin/reboot

ReconfigFlags           = (null)

RequeueExit             = (null)

RequeueExitHold         = (null)

ResumeFailProgram       = (null)

ResumeProgram           = (null)

ResumeRate              = 300 nodes/min

ResumeTimeout           = 60 sec

ResvEpilog              = (null)

ResvOverRun             = 0 min

ResvProlog              = (null)

ReturnToService         = 2

RoutePlugin             = route/default

SchedulerParameters     = batch_sched_delay=10,bf_continue,bf_max_job_part=1000,bf_max_job_test=10000,bf_max_job_user=100,bf_resolution=300,bf_window=10080,bf_yield_interval=1000000,default_queue_depth=1000,partition_job_depth=600,sched_min_interval=20000000,defer,max_rpc_cnt=80

SchedulerTimeSlice      = 30 sec

SchedulerType           = sched/backfill

ScronParameters         = (null)

SelectType              = select/cons_tres

SelectTypeParameters    = CR_CPU_MEMORY

SlurmUser               = slurm(47)

SlurmctldAddr           = (null)

SlurmctldDebug          = info

SlurmctldHost[0]        = ASlurmCluster-sched(x.x.x.x)

SlurmctldLogFile        = /data/slurm/slurmctld.log

SlurmctldPort           = 6820-6824

SlurmctldSyslogDebug    = (null)

SlurmctldPrimaryOffProg = (null)

SlurmctldPrimaryOnProg  = (null)

SlurmctldTimeout        = 6000 sec

SlurmctldParameters     = (null)

SlurmdDebug             = info

SlurmdLogFile           = /var/log/slurm/slurmd.log

SlurmdParameters        = (null)

SlurmdPidFile           = /var/run/slurmd.pid

SlurmdPort              = 6818

SlurmdSpoolDir          = /var/spool/slurmd

SlurmdSyslogDebug       = (null)

SlurmdTimeout           = 600 sec

SlurmdUser              = root(0)

SlurmSchedLogFile       = (null)

SlurmSchedLogLevel      = 0

SlurmctldPidFile        = /var/run/slurmctld.pid

SlurmctldPlugstack      = (null)

SLURM_CONF              = /etc/slurm/slurm.conf

SLURM_VERSION           = 22.05.6

SrunEpilog              = (null)

SrunPortRange           = 0-0

SrunProlog              = (null)

StateSaveLocation       = /data/slurm/slurmctld

SuspendExcNodes         = (null)

SuspendExcParts         = (null)

SuspendProgram          = (null)

SuspendRate             = 60 nodes/min

SuspendTime             = INFINITE

SuspendTimeout          = 30 sec

SwitchParameters        = (null)

SwitchType              = switch/none

TaskEpilog              = (null)

TaskPlugin              = cgroup,affinity

TaskPluginParam         = (null type)

TaskProlog              = (null)

TCPTimeout              = 2 sec

TmpFS                   = /tmp

TopologyParam           = (null)

TopologyPlugin          = topology/none

TrackWCKey              = No

TreeWidth               = 50

UsePam                  = No

UnkillableStepProgram   = (null)

UnkillableStepTimeout   = 600 sec

VSizeFactor             = 0 percent

WaitTime                = 0 sec

X11Parameters           = home_xauthority

Cgroup Support Configuration:

AllowedKmemSpace        = (null)

AllowedRAMSpace         = 100.0%

AllowedSwapSpace        = 1.0%

CgroupAutomount         = yes

CgroupMountpoint        = /sys/fs/cgroup

CgroupPlugin            = cgroup/v2

ConstrainCores          = yes

ConstrainDevices        = yes

ConstrainKmemSpace      = no

ConstrainRAMSpace       = yes

ConstrainSwapSpace      = yes

IgnoreSystemd           = no

IgnoreSystemdOnFailure  = no

MaxKmemPercent          = 100.0%

MaxRAMPercent           = 100.0%

MaxSwapPercent          = 100.0%

MemorySwappiness        = (null)

MinKmemSpace            = 30 MB

MinRAMSpace             = 30 MB

Slurmctld(primary) at ASlurmCluster-sched is UP


Reply via email to