...and... you need to restart slurmctld when you change a NodeName line. "scontrol reconfigure" doesn't do the truck.
On Mon, Jul 26, 2021 at 12:49 PM Fulcomer, Samuel <samuel_fulco...@brown.edu> wrote: > If you have a dual-root PCIe system you may need to specify the CPU/core > affinity in gres.conf. > > On Mon, Jul 26, 2021 at 12:07 PM Jason Simms <sim...@lafayette.edu> wrote: > >> Hello all, >> >> I have a GPU node with 3 identical GPUs (we started with two and recently >> added the third). Running nvidia-smi correctly shows that all three are >> recognized. My gres.conf file has only this line: >> >> NodeName=gpu01 File=/dev/nvidia[0-2] Type=quadro_8000 Name=gpu Count=3 >> >> And the relevant lines in slurm.conf are: >> >> NodeName=gpu01 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 >> RealMemory=189900 State=UNKNOWN Gres=gpu:quadro_8000:3 >> >> As far as I can tell, all of this is fine (and we had no issues when we >> only had the initial two GPUs in the system). However, now when I run sinfo >> -o %G (which as I understand will report the total number of gres >> resources available), this is the output: >> >> GRES >> (null) >> gpu:quadro_8000:2 >> >> Is this saying that it doesn't recognize the third card? Any suggestions? >> As always, thank you for your help! >> >> Warmest regards, >> Jason >> >> -- >> *Jason L. Simms, Ph.D., M.P.H.* >> Manager of Research and High-Performance Computing >> XSEDE Campus Champion >> Lafayette College >> Information Technology Services >> 710 Sullivan Rd | Easton, PA 18042 >> Office: 112 Skillman Library >> p: (610) 330-5632 >> >