Hi Sean,

On 1/14/21 9:19 AM, Sean Crosby wrote:
Hi Abhiram,

You need to configure cgroup.conf to constrain the devices a job has access to. See https://slurm.schedmd.com/cgroup.conf.html <https://slurm.schedmd.com/cgroup.conf.html>

My cgroup.conf is

CgroupAutomount=yes
AllowedDevicesFile="/usr/local/slurm/etc/cgroup_allowed_devices_file.conf"

ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
ConstrainDevices=yes

TaskAffinity=no

CgroupMountpoint=/sys/fs/cgroup

The ConstrainDevices=yes is the key to stopping jobs from having access to GPUs they didn't request.

I'm just curious about your AllowedDevicesFile parameter, which doesn't seem to exist in the current Slurm versions 20.*. Can you confirm that AllowedDevicesFile refers to an older Slurm version?

The gres.conf file handles device files currently, see https://slurm.schedmd.com/gres.conf.html

Thanks,
Ole

Reply via email to