If you have a dual-root PCIe system you may need to specify the CPU/core affinity in gres.conf.
On Mon, Jul 26, 2021 at 12:07 PM Jason Simms <sim...@lafayette.edu> wrote: > Hello all, > > I have a GPU node with 3 identical GPUs (we started with two and recently > added the third). Running nvidia-smi correctly shows that all three are > recognized. My gres.conf file has only this line: > > NodeName=gpu01 File=/dev/nvidia[0-2] Type=quadro_8000 Name=gpu Count=3 > > And the relevant lines in slurm.conf are: > > NodeName=gpu01 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 > RealMemory=189900 State=UNKNOWN Gres=gpu:quadro_8000:3 > > As far as I can tell, all of this is fine (and we had no issues when we > only had the initial two GPUs in the system). However, now when I run sinfo > -o %G (which as I understand will report the total number of gres > resources available), this is the output: > > GRES > (null) > gpu:quadro_8000:2 > > Is this saying that it doesn't recognize the third card? Any suggestions? > As always, thank you for your help! > > Warmest regards, > Jason > > -- > *Jason L. Simms, Ph.D., M.P.H.* > Manager of Research and High-Performance Computing > XSEDE Campus Champion > Lafayette College > Information Technology Services > 710 Sullivan Rd | Easton, PA 18042 > Office: 112 Skillman Library > p: (610) 330-5632 >