Hello Slurm Admins, I have set up Slurm for a GPU-cluster. The basic installation without gres/gpu works well. Now I try adding the GPUs to the Slurm configuration. All attempts have failed so far and I always get with sinfo -R the message
gres/gpu count reported lower than configured ( 0 < 2 ) With nvidia-smi the GPUs are found and running jobs on them works well. I have tried to get rid off the above error by updating the state to IDLE with scontrol. That attempt also failed with error message slurm_update error: Invalid node state specified I ran slurmd on the GPU node with debug5 level. From slurmd.log I see that gres.conf is found and gres_gpu.so / gpu_genric.so are loaded. My Slurm configuration is as follows: slurm.conf: GresTypes=gpu NodeName=hpc-node14 CPUs=128 RealMemory=515815 Sockets=2 CoresPerSocket=64 ThreadsPerCore=1 Gres=gpu:2 State=UNKNOWN gres.conf: NodeName=hpc-node[01-14] Name=gpu File=/dev/nvidia[0-1] Does anyone know what is wrong and how to fix that problem? Thank you. Best wishes Achim