I replaced a Nvidia v100 with a t4. Now slurm thinks there is no gpu present:
$ sudo scontrol show node fabricnode2 NodeName=fabricnode2 Arch=x86_64 CoresPerSocket=6 CPUAlloc=0 CPUTot=12 CPULoad=0.02 AvailableFeatures=(null) ActiveFeatures=(null) Gres=gpu:nvidia:1 NodeAddr=fabricnode2 NodeHostName=fabricnode2 Version=19.05.4 OS=Linux 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020 RealMemory=7802 AllocMem=0 FreeMem=6828 Sockets=1 Boards=1 State=IDLE+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=debug BootTime=2020-04-27T10:24:18 SlurmdStartTime=2020-04-27T10:39:53 CfgTRES=cpu=12,mem=7802M,billing=12 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Reason=gres/gpu count reported lower than configured (0 < 1) [root@2020-04-27T10:34:25] The gpu is there and I can execute cuda binaries on it. nvidia-smi also shows it present. I've also rebooted the node, restarted slurmctld and run reconfigure. How does slurm determine if a gpu is present, because it's getting it wrong?