This started working for me this morning. I have no idea why it started to work. Maybe it was multiple restarts of the various daemons that did it.
-----Original Message----- From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Brian W. Johanson Sent: Tuesday, February 4, 2020 1:35 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu Please include the output for: scontrol show node=liqidos-dean-node1 scontrol show partition=Partition_you_are_attempting_to_submit_to and any other #SBATCH lines submitted with the failing job. On 2/4/20 9:42 AM, dean.w.schu...@gmail.com wrote: > I've already restarted slurmctld and slurmd on all nodes. Still get the same > problem. > > -----Original Message----- > From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of > Marcus Wagner > Sent: Tuesday, February 4, 2020 2:31 AM > To: slurm-users@lists.schedmd.com > Subject: Re: [slurm-users] sbatch script won't accept --gres that > requires more than 1 gpu > > Hi Dean, > > could you please try to restart the slurmctld? > > This usually helps on our site. > Never saw this with gres happening, but many other times. > This is, why we restart slurmctld once a day by a cron job. > > > Best > Marcus > > On 2/4/20 12:59 AM, Dean Schulze wrote: >> When I run an sbatch script with the line >> >> #SBATCH --gres=gpu:gp100:1 >> >> it runs. When I change it to >> >> #SBATCH --gres=gpu:gp100:3 >> >> it fails with "Requested node configuration is not available". But I >> have a node with 4 gp100s available. Here's my slurm.conf: >> >> NodeName=liqidos-dean-node1 CPUs=2 Boards=1 SocketsPerBoard=2 >> CoresPerSocket=1 ThreadsPerCore=1 RealMemory=3770 Gres=gpu:gp100:4 >> >> That node has a gres.conf with these lines: >> >> Name=gpu Type=gp100 File=/dev/nvidia0 Name=gpu Type=gp100 >> File=/dev/nvidia1 Name=gpu Type=gp100 File=/dev/nvidia2 Name=gpu >> Type=gp100 File=/dev/nvidia3 >> >> The character devices all exist in /dev. >> >> What's the controller complaining about? > -- > Marcus Wagner, Dipl.-Inf. > > IT Center > Abteilung: Systeme und Betrieb > RWTH Aachen University > Seffenter Weg 23 > 52074 Aachen > Tel: +49 241 80-24383 > Fax: +49 241 80-624383 > wag...@itc.rwth-aachen.de > www.itc.rwth-aachen.de > > > >