Re: [slurm-users] Core reserved/bound to a GPU

Stephan Schott Mon, 31 Aug 2020 07:58:15 -0700

Hi,
I'm also very interested in how this could be done properly. At the moment
what we are doing is setting up partitions with MaxCPUsPerNode set to
CPUs-GPUs. Maybe this can help you in the meanwhile, but this is a
suboptimal solution (in fact we have nodes with different number of CPUs,
so we had to make a partition per "node type"). Someone else can have a
better idea.
Cheers,


El lun., 31 ago. 2020 a las 16:45, Manuel BERTRAND (<
manuel.bertr...@lis-lab.fr>) escribió:

> Hi list,
>
> I am totally new to Slurm and have just deployed a heterogeneous GPU/CPU
> cluster by following the latest OpenHPC recipe on CentOS 8.2 (thanks
> OpenHPC team for making those !)
> Every thing works great so far but now I would like to bound a specific
> core to each GPUs on each node. By "bound" I mean to make a particular
> core not assignable to a CPU job alone so that the GPU is available
> whatever the CPU workload on the node. I'm asking this because in the
> actual state a CPU only user can monopolize the whole node, preventing a
> GPU user to come in as there is no CPU available even if the GPU is
> free. I'm not sure what is the best way to enforce this. Hope this is
> clear :)
>
> Any help greatly appreciated !
>
> Here is my gres.conf, cgroup.conf, partitions configuration, followed by
> the output of 'scontrol show config':
>
> ########### gres.conf ############
> NodeName=gpunode1 Name=gpu  File=/dev/nvidia0
> NodeName=gpunode1 Name=gpu  File=/dev/nvidia1
> NodeName=gpunode1 Name=gpu  File=/dev/nvidia2
> NodeName=gpunode1 Name=gpu  File=/dev/nvidia3
> NodeName=gpunode2 Name=gpu  File=/dev/nvidia0
> NodeName=gpunode2 Name=gpu  File=/dev/nvidia1
> NodeName=gpunode2 Name=gpu  File=/dev/nvidia2
> NodeName=gpunode3 Name=gpu  File=/dev/nvidia0
> NodeName=gpunode3 Name=gpu  File=/dev/nvidia1
> NodeName=gpunode3 Name=gpu  File=/dev/nvidia2
> NodeName=gpunode3 Name=gpu  File=/dev/nvidia3
> NodeName=gpunode3 Name=gpu  File=/dev/nvidia4
> NodeName=gpunode3 Name=gpu  File=/dev/nvidia5
> NodeName=gpunode3 Name=gpu  File=/dev/nvidia6
> NodeName=gpunode3 Name=gpu  File=/dev/nvidia7
> NodeName=gpunode4 Name=gpu  File=/dev/nvidia0
> NodeName=gpunode4 Name=gpu  File=/dev/nvidia1
> NodeName=gpunode5 Name=gpu  File=/dev/nvidia0
> NodeName=gpunode5 Name=gpu  File=/dev/nvidia1
> NodeName=gpunode5 Name=gpu  File=/dev/nvidia2
> NodeName=gpunode5 Name=gpu  File=/dev/nvidia3
> NodeName=gpunode5 Name=gpu  File=/dev/nvidia4
> NodeName=gpunode5 Name=gpu  File=/dev/nvidia5
> NodeName=gpunode6 Name=gpu  File=/dev/nvidia0
> NodeName=gpunode6 Name=gpu  File=/dev/nvidia1
> NodeName=gpunode6 Name=gpu  File=/dev/nvidia2
> NodeName=gpunode6 Name=gpu  File=/dev/nvidia3
> NodeName=gpunode7 Name=gpu  File=/dev/nvidia0
> NodeName=gpunode7 Name=gpu  File=/dev/nvidia1
> NodeName=gpunode7 Name=gpu  File=/dev/nvidia2
> NodeName=gpunode7 Name=gpu  File=/dev/nvidia3
> NodeName=gpunode8 Name=gpu  File=/dev/nvidia0
> NodeName=gpunode8 Name=gpu  File=/dev/nvidia1
>
> ########### cgroup.conf ############
> CgroupAutomount=yes
> TaskAffinity=no
> ConstrainCores=yes
> ConstrainRAMSpace=yes
> ConstrainSwapSpace=yes
> ConstrainKmemSpace=no
> ConstrainDevices=yes
>
>
> ########### partitions configuration ###########
> PartitionName=cpu Nodes=cpunode1,cpunode2,cpunode3,cpunode4,cpunode5
> Default=NO DefaultTime=60 MaxTime=168:00:00 State=UP
> PartitionName=gpu
> Nodes=gpunode1,gpunode2,gpunode3,gpunode4,gpunode5,gpunode6,gpunode7,gpunode8
>
> Default=NO DefaultTime=60 MaxTime=168:00:00 State=UP
> PartitionName=all Nodes=ALL Default=YES DefaultTime=60 MaxTime=168:00:00
> State=UP
>
>
> ########### Slurm configuration ###########
> Configuration data as of 2020-08-31T16:23:54
> AccountingStorageBackupHost = (null)
> AccountingStorageEnforce = none
> AccountingStorageHost   = sms.mycluster
> AccountingStorageLoc    = N/A
> AccountingStoragePort   = 6819
> AccountingStorageTRES   = cpu,mem,energy,node,billing,fs/disk,vmem,pages
> AccountingStorageType   = accounting_storage/slurmdbd
> AccountingStorageUser   = N/A
> AccountingStoreJobComment = No
> AcctGatherEnergyType    = acct_gather_energy/none
> AcctGatherFilesystemType = acct_gather_filesystem/none
> AcctGatherInterconnectType = acct_gather_interconnect/none
> AcctGatherNodeFreq      = 0 sec
> AcctGatherProfileType   = acct_gather_profile/none
> AllowSpecResourcesUsage = No
> AuthAltTypes            = (null)
> AuthInfo                = (null)
> AuthType                = auth/munge
> BatchStartTimeout       = 10 sec
>
> EpilogMsgTime           = 2000 usec
> EpilogSlurmctld         = (null)
> ExtSensorsType          = ext_sensors/none
> ExtSensorsFreq          = 0 sec
> FederationParameters    = (null)
> FirstJobId              = 1
> GetEnvTimeout           = 2 sec
> GresTypes               = gpu
> GpuFreqDef              = high,memory=high
> GroupUpdateForce        = 1
> GroupUpdateTime         = 600 sec
> HASH_VAL                = Match
> HealthCheckInterval     = 300 sec
> HealthCheckNodeState    = ANY
> HealthCheckProgram      = /usr/sbin/nhc
> InactiveLimit           = 0 sec
> JobAcctGatherFrequency  = 30
> JobAcctGatherType       = jobacct_gather/none
> JobAcctGatherParams     = (null)
> JobCompHost             = localhost
> JobCompLoc              = /var/log/slurm_jobcomp.log
> JobCompPort             = 0
> JobCompType             = jobcomp/none
> JobCompUser             = root
> JobContainerType        = job_container/none
> JobCredentialPrivateKey = (null)
> JobCredentialPublicCertificate = (null)
> JobDefaults             = (null)
> JobFileAppend           = 0
> JobRequeue              = 1
> JobSubmitPlugins        = (null)
> KeepAliveTime           = SYSTEM_DEFAULT
> KillOnBadExit           = 0
> KillWait                = 30 sec
> LaunchParameters        = (null)
> LaunchType              = launch/slurm
> Layouts                 =
> Licenses                = (null)
> LogTimeFormat           = iso8601_ms
> MailDomain              = (null)
> MailProg                = /usr/bin/mail
> MaxArraySize            = 1001
> MaxDBDMsgs              = 20052
> MaxJobCount             = 10000
> MaxJobId                = 67043328
> MaxMemPerNode           = UNLIMITED
> MaxStepCount            = 40000
>
> PropagateResourceLimits = (null)
> PropagateResourceLimitsExcept = MEMLOCK
> RebootProgram           = /sbin/reboot
> ReconfigFlags           = (null)
> RequeueExit             = (null)
> RequeueExitHold         = (null)
> ResumeFailProgram       = (null)
> ResumeProgram           = (null)
> ResumeRate              = 300 nodes/min
> ResumeTimeout           = 600 sec
> ResvEpilog              = (null)
> ResvOverRun             = 0 min
> ResvProlog              = (null)
> ReturnToService         = 2
> RoutePlugin             = route/default
> SallocDefaultCommand    = (null)
> SbcastParameters        = (null)
> SchedulerParameters     = (null)
> SchedulerTimeSlice      = 30 sec
> SchedulerType           = sched/backfill
> SelectType              = select/cons_tres
> SelectTypeParameters    = CR_CORE
> SlurmUser               = slurm(202)
> SlurmctldAddr           = (null)
> SlurmctldDebug          = debug2
> SlurmctldHost[0]        = sms.mycluster
> SlurmctldLogFile        = /var/log/slurmctld.log
> SlurmctldPort           = 6817
> SlurmctldSyslogDebug    = unknown
> SlurmctldPrimaryOffProg = (null)
> SlurmctldPrimaryOnProg  = (null)
> SlurmctldTimeout        = 300 sec
> SlurmctldParameters     = enable_configless
> SlurmdDebug             = debug2
> SlurmdLogFile           = /var/log/slurmd.log
> SlurmdParameters        = (null)
> SlurmdPidFile           = /var/run/slurmd.pid
> SlurmdPort              = 6818
> SlurmdSpoolDir          = /var/spool/slurm/d
> SlurmdSyslogDebug       = unknown
> SlurmdTimeout           = 300 sec
> SlurmdUser              = root(0)
> SlurmSchedLogFile       = (null)
> SlurmSchedLogLevel      = 0
> SlurmctldPidFile        = /var/run/slurmctld.pid
> SlurmctldPlugstack      = (null)
> SLURM_CONF              = /etc/slurm/slurm.conf
>
> SrunPortRange           = 0-0
> SrunProlog              = (null)
> StateSaveLocation       = /var/spool/slurm/ctld
> SuspendExcNodes         = (null)
> SuspendExcParts         = (null)
> SuspendProgram          = (null)
> SuspendRate             = 60 nodes/min
> SuspendTime             = NONE
> SuspendTimeout          = 30 sec
> SwitchType              = switch/none
> TaskEpilog              = (null)
> TaskPlugin              = task/affinity,task/cgroup
> TaskPluginParam         = (null type)
> TaskProlog              = (null)
> TCPTimeout              = 2 sec
> TmpFS                   = /scratch
> TopologyParam           = (null)
> TopologyPlugin          = topology/none
> TrackWCKey              = No
> TreeWidth               = 50
> UsePam                  = No
> UnkillableStepProgram   = (null)
> UnkillableStepTimeout   = 60 sec
> VSizeFactor             = 0 percent
> WaitTime                = 0 sec
> X11Parameters           = (null)
>
> Cgroup Support Configuration:
> AllowedDevicesFile      = /etc/slurm/cgroup_allowed_devices_file.conf
> AllowedKmemSpace        = (null)
> AllowedRAMSpace         = 100.0%
> AllowedSwapSpace        = 0.0%
> CgroupAutomount         = yes
> CgroupMountpoint        = /sys/fs/cgroup
> ConstrainCores          = yes
> ConstrainDevices        = yes
> ConstrainKmemSpace      = no
> ConstrainRAMSpace       = yes
> ConstrainSwapSpace      = yes
> MaxKmemPercent          = 100.0%
> MaxRAMPercent           = 100.0%
> MaxSwapPercent          = 100.0%
> MemorySwappiness        = (null)
> MinKmemSpace            = 30 MB
> MinRAMSpace             = 30 MB
> TaskAffinity            = no
>
> Slurmctld(primary) at sms.mycluster is UP
>
>
>

-- 
Stephan Schott Verdugo
Biochemist

Heinrich-Heine-Universitaet Duesseldorf
Institut fuer Pharm. und Med. Chemie
Universitaetsstr. 1
40225 Duesseldorf
Germany

Re: [slurm-users] Core reserved/bound to a GPU

Reply via email to