Yes, you are right. AutoDetect=off in the gres.conf file solved the
problem! Thank you very much!!


Best wishes

Achim

________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Groner, 
Rob <rug...@psu.edu>
Sent: Friday, October 21, 2022 16:26
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] gres/gpu count reported lower than configured

I've encountered that many times, and for me, it was always related to 
AutoDetect and the nvidia-ml library.  Does your slurmd log contain a line like 
"debug:  skipping GRES for NodeName=t-gc-1202  AutoDetect=nvml"?  I see that 
you didn't specifically set AutoDetect to nvml in gres.conf, but maybe you 
should set AutoDetect=off just to be sure.

If "sinfo" shows an "inval" node, then setting them to Resume (not Idle) won't 
work until you figure out why it thinks the node configuration is invalid.

Reply via email to