I am building a cluster exclusively with dynamic nodes, which all boot
up over the network from the same system image (Debian 12); so far there
is just one physical node, as well as a vm that I have used for the
initial tests:
# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
all*
t list the GPU even when I set SlurmdDebug=debug2
in the config file (but I see other entries for debug2).
I've set 'AutoDetect=nvml' in gres.conf and 'GresTypes=gpu' in
slurm.conf; shouldn't it work by now? Or is my build of slurm still not
good?
On 19/07/2023 12
;s page:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
On 19/07/2023 12:26, Timo Rothenpieler wrote:
On 19/07/2023 11:47, Jan Andersen wrote:
I'm trying to build slurm with nvml support, but configure doesn't
find it:
root@zorn:~/slurm-23.02.3# ./configure --with-nv
I'm trying to build slurm with nvml support, but configure doesn't find it:
root@zorn:~/slurm-23.02.3# ./configure --with-nvml
...
checking for hwloc installation... /usr
checking for nvml.h... no
checking for nvmlInit in -lnvidia-ml... yes
configure: error: unable to locate libnvidia-ml.so and/o