Hi all,

We want to add in some Gres resource types pertaining to GPUs (amount of GPU 
memory and CUDA cores) on some of our nodes. So we added the following params 
into the 'gres.conf' on the nodes that have GPUs:

Name=gpu_mem Count=<#>G 
Name=gpu_cores Count=<#>

And in slurm.conf:

GresTypes=gpu,gpu_mem,gpu_cores

And down in the NodeName lines for these servers:

Gres=gpu:<#>,gpu_mem:no_consume:<#>G,gpu_cores:no_consume:<#>

(where <#> of course is the relevant numerical value)

However, upon restarting the slurmctld on the controller, and the slurmd on the 
clients, the nodes appear to be unhappy with this, giving a message such as:

Reason=gres/gpu_mem count too low (0 < 4294967296) [root@2018-09-24T11:36:01]

And of course are then going into DRAIN mode.

We are running Slurm v16.04.5, is doing something like the above a possibility 
on this version? If so, what could be the problem?


Reply via email to