Did you have --with-nvml as part of your configuration?  Go back to your 
config.log and verify that it ever said it found nvml.h.

If not, then you'll need to make sure you have the right nvidia/cuda packages 
installed on the host you're building slurm on, and you might have to specify 
--with-nvml=<path to nvml install> if it's not in a standard location.

Rob

________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Ravi 
Konila <ravibh...@gmail.com>
Sent: Thursday, November 30, 2023 9:06 AM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Autodetect of nvml is not working in gres.conf

You don't often get email from ravibh...@gmail.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Hello,

My gres.conf has AutoDetect=nvml
when I restart slurmd service I do get

fatal: We were configured to autodetect nvml functionality, but we weren't able 
to find that lib when Slurm was configured.

Referred few links to solve along with slurm-users email archives but could not 
understand much.

Can someone help me with this one. I am using DGX A100 Server which has 4 
numbers of A100 80GB GPUs.

With Warm Regards
Ravi Konila

Reply via email to