Brian, the specific node does not show any gres...
root@panther02 slurm# scontrol show partition=tiger_1
PartitionName=tiger_1
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO
MaxCPUsPerNode=UNLIMITED
Nodes=tiger[01-22]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=1056 TotalNodes=22 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
root@panther02 slurm# scontrol show node=tiger11
NodeName=tiger11 Arch=x86_64 CoresPerSocket=12
CPUAlloc=0 CPUTot=48 CPULoad=11.50
AvailableFeatures=HyperThread
ActiveFeatures=HyperThread
Gres=(null)
NodeAddr=X.X.X.X NodeHostName=tiger11 Version=18.08
OS=Linux 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015
RealMemory=1 AllocMem=0 FreeMem=269695 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=tiger_1,compute_1
BootTime=2018-04-02T13:30:12 SlurmdStartTime=2018-12-03T16:13:22
CfgTRES=cpu=48,mem=1M,billing=48
AllocTRES=
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
So, something is not setup correctly... Could it be a 18.X bug?
Thanks.
On Tue, Dec 4, 2018 at 9:31 AM Lou Nicotra <lnico...@interactions.com
<mailto:lnico...@interactions.com>> wrote:
Thanks Michael. I will try 17.x as I also could not see anything
wrong with my settings... Will report back afterwards...
Lou
On Tue, Dec 4, 2018 at 9:11 AM Michael Di Domenico
<mdidomeni...@gmail.com <mailto:mdidomeni...@gmail.com>> wrote:
unfortunately, someone smarter then me will have to help
further. I'm
not sure i see anything specifically wrong. The one thing i
might try
is backing the software down to a 17.x release series. I recently
tried 18.x and had some issues. I can't say whether it'll be any
different, but you might be exposing an undiagnosed bug in the 18.x
branch
On Mon, Dec 3, 2018 at 4:17 PM Lou Nicotra
<lnico...@interactions.com <mailto:lnico...@interactions.com>>
wrote:
>
> Made the change in the gres.conf on local server file and
restarted slurmd and slurmctld on master.... Unfortunately same
error...
>
> Distributed corrected gres.conf to all k20 servers, restarted
slurmd and slurmdctl... Still has same error...
>
> On Mon, Dec 3, 2018 at 4:04 PM Brian W. Johanson
<bjoha...@psc.edu <mailto:bjoha...@psc.edu>> wrote:
>>
>> Is that a lowercase k in k20 specified in the batch script and
nodename and a uppercase K specified in gres.conf?
>>
>> On 12/03/2018 09:13 AM, Lou Nicotra wrote:
>>
>> Hi All, I have recently set up a slurm cluster with my servers
and I'm running into an issue while submitting GPU jobs. It has
something to to with gres configurations, but I just can't seem
to figure out what is wrong. Non GPU jobs run fine.
>>
>> The error is as follows:
>> sbatch: error: Batch job submission failed: Invalid Trackable
RESource (TRES) specification after submitting a batch job.
>>
>> My batch job is as follows:
>> #!/bin/bash
>> #SBATCH --partition=tiger_1 # partition name
>> #SBATCH --gres=gpu:k20:1
>> #SBATCH --gres-flags=enforce-binding
>> #SBATCH --time=0:20:00 # wall clock limit
>> #SBATCH --output=gpu-%J.txt
>> #SBATCH --account=lnicotra
>> module load cuda
>> python gpu1
>>
>> Where gpu1 is a GPU test script that runs correctly while
invoked via python. Tiger_1 partition has servers with GPUs, with
a mix of 1080GTX and K20 as specified in slurm.conf
>>
>> I have defined GRES resources in the slurm.conf file:
>> # GPU GRES
>> GresTypes=gpu
>> NodeName=tiger[01,05,10,15,20] Gres=gpu:1080gtx:2
>> NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Gres=gpu:k20:2
>>
>> And have a local gres.conf on the servers containing GPUs...
>> lnicotra@tiger11 ~# cat /etc/slurm/gres.conf
>> # GPU Definitions
>> # NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Name=gpu
Type=K20 File=/dev/nvidia[0-1]
>> Name=gpu Type=K20 File=/dev/nvidia[0-1] Cores=0,1
>>
>> and a similar one for the 1080GTX
>> # GPU Definitions
>> # NodeName=tiger[01,05,10,15,20] Name=gpu Type=1080GTX
File=/dev/nvidia[0-1]
>> Name=gpu Type=1080GTX File=/dev/nvidia[0-1] Cores=0,1
>>
>> The account manager seems to know about the GPUs...
>> lnicotra@tiger11 ~# sacctmgr show tres
>> Type Name ID
>> -------- --------------- ------
>> cpu 1
>> mem 2
>> energy 3
>> node 4
>> billing 5
>> fs disk 6
>> vmem 7
>> pages 8
>> gres gpu 1001
>> gres gpu:k20 1002
>> gres gpu:1080gtx 1003
>>
>> Can anyone point out what am I missing?
>>
>> Thanks!
>> Lou
>>
>>
>> --
>>
>> Lou Nicotra
>>
>> IT Systems Engineer - SLT
>>
>> Interactions LLC
>>
>> o: 908-673-1833
>>
>> m: 908-451-6983
>>
>> lnico...@interactions.com <mailto:lnico...@interactions.com>
>>
>> www.interactions.com <http://www.interactions.com>
>>
>>
*******************************************************************************
>>
>> This e-mail and any of its attachments may contain
Interactions LLC proprietary information, which is privileged,
confidential, or subject to copyright belonging to the
Interactions LLC. This e-mail is intended solely for the use of
the individual or entity to which it is addressed. If you are not
the intended recipient of this e-mail, you are hereby notified
that any dissemination, distribution, copying, or action taken in
relation to the contents of and attachments to this e-mail is
strictly prohibited and may be unlawful. If you have received
this e-mail in error, please notify the sender immediately and
permanently delete the original and any copy of this e-mail and
any printout. Thank You.
>>
>>
*******************************************************************************
>>
>>
>
>
> --
>
> Lou Nicotra
>
> IT Systems Engineer - SLT
>
> Interactions LLC
>
> o: 908-673-1833
>
> m: 908-451-6983
>
> lnico...@interactions.com <mailto:lnico...@interactions.com>
>
> www.interactions.com <http://www.interactions.com>
>
>
*******************************************************************************
>
> This e-mail and any of its attachments may contain Interactions
LLC proprietary information, which is privileged, confidential,
or subject to copyright belonging to the Interactions LLC. This
e-mail is intended solely for the use of the individual or entity
to which it is addressed. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution, copying, or action taken in relation to the
contents of and attachments to this e-mail is strictly prohibited
and may be unlawful. If you have received this e-mail in error,
please notify the sender immediately and permanently delete the
original and any copy of this e-mail and any printout. Thank You.
>
>
*******************************************************************************
--
*Lou Nicotra*
IT Systems Engineer - SLT
Interactions LLC
o: 908-673-1833 <tel:781-405-5114>
m: 908-451-6983 <tel:781-405-5114>
_lnico...@interactions.com <mailto:lnico...@interactions.com>_
www.interactions.com <http://www.interactions.com/>
--
*Lou Nicotra*
IT Systems Engineer - SLT
Interactions LLC
o: 908-673-1833 <tel:781-405-5114>
m: 908-451-6983 <tel:781-405-5114>
_lnico...@interactions.com <mailto:lnico...@interactions.com>_
www.interactions.com <http://www.interactions.com/>
*******************************************************************************
This e-mail and any of its attachments may contain Interactions LLC
proprietary information, which is privileged, confidential, or subject to
copyright belonging to the Interactions LLC. This e-mail is intended
solely for the use of the individual or entity to which it is addressed.
If you are not the intended recipient of this e-mail, you are hereby
notified that any dissemination, distribution, copying, or action taken
in relation to the contents of and attachments to this e-mail is strictly
prohibited and may be unlawful. If you have received this e-mail in
error, please notify the sender immediately and permanently delete the
original and any copy of this e-mail and any printout. Thank You.
*******************************************************************************