Loris, You are correct! Instead of using nvidia-smi as a check, I confirmed the GPU allocation by printing out the environment variable, CUDA_VISIBILE_DEVICES, and it was as expected.
Thanks for your help! On Thu, Jan 14, 2021 at 12:18 AM Loris Bennett <loris.benn...@fu-berlin.de> wrote: > Hi Abhiram, > > Abhiram Chintangal <achintan...@berkeley.edu> writes: > > > Hello, > > > > I recently set up a small cluster at work using Warewulf/Slurm. > Currently, I am not able to get the scheduler to > > work well with GPU's (Gres). > > > > While slurm is able to filter by GPU type, it allocates all the GPU's on > the node. See below: > > > > [abhiram@whale ~]$ srun --gres=gpu:p100:2 -n 1 --partition=gpu > nvidia-smi --query-gpu=index,name --format=csv > > index, name > > 0, Tesla P100-PCIE-16GB > > 1, Tesla P100-PCIE-16GB > > 2, Tesla P100-PCIE-16GB > > 3, Tesla P100-PCIE-16GB > > [abhiram@whale ~]$ srun --gres=gpu:titanrtx:2 -n 1 --partition=gpu > nvidia-smi --query-gpu=index,name --format=csv > > index, name > > 0, TITAN RTX > > 1, TITAN RTX > > 2, TITAN RTX > > 3, TITAN RTX > > 4, TITAN RTX > > 5, TITAN RTX > > 6, TITAN RTX > > 7, TITAN RTX > > > > I am fairly new to Slurm and still figuring out my way around it. I > would really appreciate any help with this. > > > > For your reference, I attached the slurm.conf and gres.conf files. > > I think this is expected, since nvidia-smi does not actually use the > GPUs, but just returns information on their usage. > > A better test would be to run a simple test which really does run on, > say, two GPU and then, while the job is running, log into the GPU node > and run > > nvidia-smi --query-gpu=index,name,utilization.gpu --format=csv > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Hr./Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > > -- Abhiram Chintangal QB3 Nogales Lab Bioinformatics Specialist @ Howard Hughes Medical Institute University of California Berkeley 708D Stanley Hall, Berkeley, CA 94720 Phone (510)666-3344