Yep they’re installed and can get all the gpu info from smi.

Thanks,

Mike
________________________________
From: Dj Merrill <d...@deej.net>
Sent: Friday, November 11, 2022 3:41:56 PM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>; Michael 
Lewis <mike.le...@queensu.ca>
Subject: Re: [slurm-users] NVML not found when Slurm was configured.

At the risk of being a silly question, do you have the NVidia drivers installed 
on the machine?

Can you type "nvidia-smi" at the command line and view the GPU info?

-Dj


On 11/11/22 15:34, Michael Lewis wrote:

Unfortunately this didn’t work out for me or I’m simply doing it wrong.  When 
the current users hop off the system I’ll do some more troubleshooting.  Any 
other insight or tips to steer me in the right direction are greatly 
appreciated.



Mike



From: slurm-users 
<slurm-users-boun...@lists.schedmd.com><mailto:slurm-users-boun...@lists.schedmd.com>
 on behalf of Michael Lewis 
<mike.le...@queensu.ca><mailto:mike.le...@queensu.ca>
Reply-To: Slurm User Community List 
<slurm-users@lists.schedmd.com><mailto:slurm-users@lists.schedmd.com>
Date: Friday, November 11, 2022 at 10:01 AM
To: Slurm User Community List 
<slurm-users@lists.schedmd.com><mailto:slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] NVML not found when Slurm was configured.



Thanks Rob!  No I just grabbed it through apt.  I’ll try that now.



Mike



From: slurm-users 
<slurm-users-boun...@lists.schedmd.com><mailto:slurm-users-boun...@lists.schedmd.com>
 on behalf of "Groner, Rob" <rug...@psu.edu><mailto:rug...@psu.edu>
Reply-To: Slurm User Community List 
<slurm-users@lists.schedmd.com><mailto:slurm-users@lists.schedmd.com>
Date: Friday, November 11, 2022 at 9:32 AM
To: "slurm-users@lists.schedmd.com"<mailto:slurm-users@lists.schedmd.com> 
<slurm-users@lists.schedmd.com><mailto:slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] NVML not found when Slurm was configured.



Hi Mike,



I can't tell if you're compiling slurm or not on your own.  You will have to if 
you want the functionality.



On RedHat8, I had to install cuda-nvml-devel-11-7, so find what the equivalent 
is for that in Ubuntu.  Basically, whatever package includes nvml.h and 
libnvidia-ml.so.  Then, modify your configure statement when building slurm to 
add "--with-nvml".  Check the configure output, because it may still not find 
it (it didn't on our system because we installed the devel package to a 
non-standard location.  If that's the case, you just change it to 
--with-nvml=<path to nvml lib dir>.  Then it should all work.



I'll note once it's all setup, then your gres.conf becomes just "<nodenames> 
AutoDetect=nvml"



G'luck.



rob



________________________________

From: slurm-users 
<slurm-users-boun...@lists.schedmd.com><mailto:slurm-users-boun...@lists.schedmd.com>
 on behalf of Michael Lewis 
<mike.le...@queensu.ca><mailto:mike.le...@queensu.ca>
Sent: Friday, November 11, 2022 9:12 AM
To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com> 
<slurm-users@lists.schedmd.com><mailto:slurm-users@lists.schedmd.com>
Subject: [slurm-users] NVML not found when Slurm was configured.




You don't often get email from 
mike.le...@queensu.ca<mailto:mike.le...@queensu.ca>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>


Hello Everyone,



New here and very new to slurm and hopefully someone can shed some light on 
this for me.  I’m in the process of setting up a single node slurm environment 
with nvidia a100.  I keep getting the error We were configured to autodetect 
nvml functionality, but we weren't able to find that lib when Slurm was 
configured.  when trying to start slurmd.  When removing GresTypes=gpu from 
slurm.conf slurmd starts up fine and can queue up and run jobs.  Cuda toolkit 
is installed along with NVIDIA Management Library (NVML).  I went as far as 
removing slurm and reinstalling to see if it would pick it up.  No go.



OS Ubuntu 20.04,  slurm.conf GresTypes=gpu is added, gres.conf AutoDetect=nvml 
Name=gpu Type=a100 File=/dev/nvidia0 COREs=0,1



I’ve searched around and see that many others have run into this but I haven’t 
found a fix yet.  Any help would be greatly appreciated.



Thanks,



Mike








Reply via email to