Re: [slurm-users] AutoDetect=nvml throwing an error message

2021-04-16 Thread Stephan Roth
Hi Cristóbal Under Debian Stretch/Buster I had to set LDFLAGS=-L/usr/lib/x86_64-linux-gnu/nvidia/current for configure to find the NVML shared library. Best, Stephan On 15.04.21 19:46, Cristóbal Navarro wrote: Hi Michael, Thanks, Indeed I don't have it. Slurm must have not detected it. I do

Re: [slurm-users] AutoDetect=nvml throwing an error message

2021-04-15 Thread Cristóbal Navarro
Hi Michael, Thanks, Indeed I don't have it. Slurm must have not detected it. I double checked and NVML is installed (libnvidia-ml-dev for Ubuntu) Here is some output, including the relevant paths for nvml. Is it possible to tell the slurm compilation to check these paths for nvml ? best *NVML PKG

Re: [slurm-users] AutoDetect=nvml throwing an error message

2021-04-15 Thread Michael Di Domenico
the error message sounds like when you built the slurm source it wasn't able to find the nvml devel packages. if you look in where you installed slurm, in lib/slurm you should have a gpu_nvml.so. do you? On Wed, Apr 14, 2021 at 5:53 PM Cristóbal Navarro wrote: > > typing error, should be --> **

Re: [slurm-users] AutoDetect=nvml throwing an error message

2021-04-14 Thread Cristóbal Navarro
typing error, should be --> **located at /usr/include/nvml.h** On Wed, Apr 14, 2021 at 5:47 PM Cristóbal Navarro < cristobal.navarr...@gmail.com> wrote: > Hi community, > I have set up the configuration files as mentioned in the documentation, > but the slurmd of the GPU-compute node fails with t

[slurm-users] AutoDetect=nvml throwing an error message

2021-04-14 Thread Cristóbal Navarro
Hi community, I have set up the configuration files as mentioned in the documentation, but the slurmd of the GPU-compute node fails with the following error shown in the log. After reading the slurm documentation, it is not entirely clear to me how to properly set up GPU autodetection for the gres.