Sean, Thanks for the clarification.I noticed that I am missing the "AllowedDevices" option in mine. After adding this, the GPU allocations started working. (Slurm version 18.08.8)
I was also incorrectly using "nvidia-smi" as a check. Regards, Abhiram On Thu, Jan 14, 2021 at 12:22 AM Sean Crosby <scro...@unimelb.edu.au> wrote: > Hi Abhiram, > > You need to configure cgroup.conf to constrain the devices a job has > access to. See https://slurm.schedmd.com/cgroup.conf.html > > My cgroup.conf is > > CgroupAutomount=yes > AllowedDevicesFile="/usr/local/slurm/etc/cgroup_allowed_devices_file.conf" > > ConstrainCores=yes > ConstrainRAMSpace=yes > ConstrainSwapSpace=yes > ConstrainDevices=yes > > TaskAffinity=no > > CgroupMountpoint=/sys/fs/cgroup > > The ConstrainDevices=yes is the key to stopping jobs from having access to > GPUs they didn't request. > > Sean > > -- > Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead > Research Computing Services | Business Services > The University of Melbourne, Victoria 3010 Australia > > > > On Thu, 14 Jan 2021 at 18:36, Abhiram Chintangal <achintan...@berkeley.edu> > wrote: > >> * UoM notice: External email. Be cautious of links, attachments, or >> impersonation attempts * >> ------------------------------ >> Hello, >> >> I recently set up a small cluster at work using Warewulf/Slurm. >> Currently, I am not able to get the scheduler to >> work well with GPU's (Gres). >> >> While slurm is able to filter by GPU type, it allocates all the GPU's on >> the node. See below: >> >> [abhiram@whale ~]$ srun --gres=gpu:p100:2 -n 1 --partition=gpu >>> nvidia-smi --query-gpu=index,name --format=csv >>> index, name >>> 0, Tesla P100-PCIE-16GB >>> 1, Tesla P100-PCIE-16GB >>> 2, Tesla P100-PCIE-16GB >>> 3, Tesla P100-PCIE-16GB >>> [abhiram@whale ~]$ srun --gres=gpu:titanrtx:2 -n 1 --partition=gpu >>> nvidia-smi --query-gpu=index,name --format=csv >>> index, name >>> 0, TITAN RTX >>> 1, TITAN RTX >>> 2, TITAN RTX >>> 3, TITAN RTX >>> 4, TITAN RTX >>> 5, TITAN RTX >>> 6, TITAN RTX >>> 7, TITAN RTX >>> >> >> I am fairly new to Slurm and still figuring out my way around it. I would >> really appreciate any help with this. >> >> For your reference, I attached the slurm.conf and gres.conf files. >> >> Best, >> >> Abhiram >> >> -- >> >> Abhiram Chintangal >> QB3 Nogales Lab >> Bioinformatics Specialist @ Howard Hughes Medical Institute >> University of California Berkeley >> 708D Stanley Hall, Berkeley, CA 94720 >> Phone (510)666-3344 >> >> -- Abhiram Chintangal QB3 Nogales Lab Bioinformatics Specialist @ Howard Hughes Medical Institute University of California Berkeley 708D Stanley Hall, Berkeley, CA 94720 Phone (510)666-3344