Hi Sean,
On 1/14/21 9:19 AM, Sean Crosby wrote:
Hi Abhiram,
You need to configure cgroup.conf to constrain the devices a job has
access to. See https://slurm.schedmd.com/cgroup.conf.html
<https://slurm.schedmd.com/cgroup.conf.html>
My cgroup.conf is
CgroupAutomount=yes
AllowedDevicesFile="/usr/local/slurm/etc/cgroup_allowed_devices_file.conf"
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
ConstrainDevices=yes
TaskAffinity=no
CgroupMountpoint=/sys/fs/cgroup
The ConstrainDevices=yes is the key to stopping jobs from having access to
GPUs they didn't request.
I'm just curious about your AllowedDevicesFile parameter, which doesn't
seem to exist in the current Slurm versions 20.*. Can you confirm that
AllowedDevicesFile refers to an older Slurm version?
The gres.conf file handles device files currently, see
https://slurm.schedmd.com/gres.conf.html
Thanks,
Ole