[slurm-users] How to fix a node in state=inval?

2023-09-01 Thread Jan Andersen
I am building a cluster exclusively with dynamic nodes, which all boot up over the network from the same system image (Debian 12); so far there is just one physical node, as well as a vm that I have used for the initial tests: # sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST all*

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

2023-07-21 Thread Jan Andersen
t list the GPU even when I set SlurmdDebug=debug2 in the config file (but I see other entries for debug2). I've set 'AutoDetect=nvml' in gres.conf and 'GresTypes=gpu' in slurm.conf; shouldn't it work by now? Or is my build of slurm still not good? On 19/07/2023 12

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

2023-07-19 Thread Jan Andersen
;s page: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html On 19/07/2023 12:26, Timo Rothenpieler wrote: On 19/07/2023 11:47, Jan Andersen wrote: I'm trying to build slurm with nvml support, but configure doesn't find it: root@zorn:~/slurm-23.02.3# ./configure --with-nv

[slurm-users] configure script can't find nvml.h or libnvidia-ml.so

2023-07-19 Thread Jan Andersen
I'm trying to build slurm with nvml support, but configure doesn't find it: root@zorn:~/slurm-23.02.3# ./configure --with-nvml ... checking for hwloc installation... /usr checking for nvml.h... no checking for nvmlInit in -lnvidia-ml... yes configure: error: unable to locate libnvidia-ml.so and/o