Do one more pass through making sure
s/1080GTX/1080gtx and s/K20/k20

shutdown all slurmd, slurmctld, start slurmctl, start slurmd


I find it less confusing to have a global gres.conf file. I haven't used a list (nvidia[0-1), mainly because I want to specify thethe cores to use for each gpu.

gres.conf would look something like...

NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Name=gpu Type=k80 File=/dev/nvidia0 Cores=0 NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Name=gpu Type=k80 File=/dev/nvidia1 Cores=1
NodeName=tiger[01,05,10,15,20] Name=gpu Type=1080gtx File=/dev/nvidia0 Cores=0
NodeName=tiger[01,05,10,15,20] Name=gpu Type=1080gtx File=/dev/nvidia1 Cores=1

which can be distributed to all nodes.

-b


On 12/04/2018 09:55 AM, Lou Nicotra wrote:
Brian, the specific node does not show any gres...
root@panther02 slurm# scontrol show partition=tiger_1
PartitionName=tiger_1
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=YES QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO 
MaxCPUsPerNode=UNLIMITED
   Nodes=tiger[01-22]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=1056 TotalNodes=22 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

root@panther02 slurm#  scontrol show node=tiger11
NodeName=tiger11 Arch=x86_64 CoresPerSocket=12
   CPUAlloc=0 CPUTot=48 CPULoad=11.50
   AvailableFeatures=HyperThread
   ActiveFeatures=HyperThread
   Gres=(null)
   NodeAddr=X.X.X.X NodeHostName=tiger11 Version=18.08
   OS=Linux 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015
   RealMemory=1 AllocMem=0 FreeMem=269695 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=tiger_1,compute_1
   BootTime=2018-04-02T13:30:12 SlurmdStartTime=2018-12-03T16:13:22
   CfgTRES=cpu=48,mem=1M,billing=48
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

So, something is not setup correctly... Could it be a 18.X bug?

Thanks.


On Tue, Dec 4, 2018 at 9:31 AM Lou Nicotra <lnico...@interactions.com <mailto:lnico...@interactions.com>> wrote:

    Thanks Michael. I will try 17.x as I also could not see anything wrong
    with my settings... Will report back afterwards...

    Lou

    On Tue, Dec 4, 2018 at 9:11 AM Michael Di Domenico <mdidomeni...@gmail.com
    <mailto:mdidomeni...@gmail.com>> wrote:

        unfortunately, someone smarter then me will have to help further.  I'm
        not sure i see anything specifically wrong.  The one thing i might try
        is backing the software down to a 17.x release series.  I recently
        tried 18.x and had some issues.  I can't say whether it'll be any
        different, but you might be exposing an undiagnosed bug in the 18.x
        branch
        On Mon, Dec 3, 2018 at 4:17 PM Lou Nicotra <lnico...@interactions.com
        <mailto:lnico...@interactions.com>> wrote:
        >
        > Made the change in the gres.conf on local server file and restarted
        slurmd and slurmctld on master.... Unfortunately same error...
        >
        > Distributed corrected gres.conf to all k20 servers, restarted slurmd
        and slurmdctl...   Still has same error...
        >
        > On Mon, Dec 3, 2018 at 4:04 PM Brian W. Johanson <bjoha...@psc.edu
        <mailto:bjoha...@psc.edu>> wrote:
        >>
        >> Is that a lowercase k in k20 specified in the batch script and
        nodename and a uppercase K specified in gres.conf?
        >>
        >> On 12/03/2018 09:13 AM, Lou Nicotra wrote:
        >>
        >> Hi All, I have recently set up a slurm cluster with my servers and
        I'm running into an issue while submitting GPU jobs. It has something
        to to with gres configurations, but I just can't seem to figure out
        what is wrong. Non GPU jobs run fine.
        >>
        >> The error is as follows:
        >> sbatch: error: Batch job submission failed: Invalid Trackable
        RESource (TRES) specification  after submitting a batch job.
        >>
        >> My batch job is as follows:
        >> #!/bin/bash
        >> #SBATCH --partition=tiger_1   # partition name
        >> #SBATCH --gres=gpu:k20:1
        >> #SBATCH --gres-flags=enforce-binding
        >> #SBATCH --time=0:20:00  # wall clock limit
        >> #SBATCH --output=gpu-%J.txt
        >> #SBATCH --account=lnicotra
        >> module load cuda
        >> python gpu1
        >>
        >> Where gpu1 is a GPU test script that runs correctly while invoked
        via python. Tiger_1 partition has servers with GPUs, with a mix of
        1080GTX and K20 as specified in slurm.conf
        >>
        >> I have defined GRES resources in the slurm.conf file:
        >> # GPU GRES
        >> GresTypes=gpu
        >> NodeName=tiger[01,05,10,15,20] Gres=gpu:1080gtx:2
        >> NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Gres=gpu:k20:2
        >>
        >> And have a local gres.conf on the servers containing GPUs...
        >> lnicotra@tiger11 ~# cat /etc/slurm/gres.conf
        >> # GPU Definitions
        >> # NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Name=gpu Type=K20
        File=/dev/nvidia[0-1]
        >> Name=gpu Type=K20 File=/dev/nvidia[0-1] Cores=0,1
        >>
        >> and a similar one for the 1080GTX
        >> # GPU Definitions
        >> # NodeName=tiger[01,05,10,15,20] Name=gpu Type=1080GTX
        File=/dev/nvidia[0-1]
        >> Name=gpu Type=1080GTX File=/dev/nvidia[0-1] Cores=0,1
        >>
        >> The account manager seems to know about the GPUs...
        >> lnicotra@tiger11 ~# sacctmgr show tres
        >>     Type            Name     ID
        >> -------- --------------- ------
        >>      cpu                      1
        >>      mem                      2
        >>   energy                      3
        >>     node                      4
        >>  billing                      5
        >>       fs            disk      6
        >>     vmem                      7
        >>    pages                      8
        >>     gres             gpu   1001
        >>     gres         gpu:k20   1002
        >>     gres     gpu:1080gtx   1003
        >>
        >> Can anyone point out what am I missing?
        >>
        >> Thanks!
        >> Lou
        >>
        >>
        >> --
        >>
        >> Lou Nicotra
        >>
        >> IT Systems Engineer - SLT
        >>
        >> Interactions LLC
        >>
        >> o:  908-673-1833
        >>
        >> m: 908-451-6983
        >>
        >> lnico...@interactions.com <mailto:lnico...@interactions.com>
        >>
        >> www.interactions.com <http://www.interactions.com>
        >>
        >>
        
*******************************************************************************
        >>
        >> This e-mail and any of its attachments may contain Interactions LLC
        proprietary information, which is privileged, confidential, or subject
        to copyright belonging to the Interactions LLC. This e-mail is
        intended solely for the use of the individual or entity to which it is
        addressed. If you are not the intended recipient of this e-mail, you
        are hereby notified that any dissemination, distribution, copying, or
        action taken in relation to the contents of and attachments to this
        e-mail is strictly prohibited and may be unlawful. If you have
        received this e-mail in error, please notify the sender immediately
        and permanently delete the original and any copy of this e-mail and
        any printout. Thank You.
        >>
        >>
        
*******************************************************************************
        >>
        >>
        >
        >
        > --
        >
        > Lou Nicotra
        >
        > IT Systems Engineer - SLT
        >
        > Interactions LLC
        >
        > o:  908-673-1833
        >
        > m: 908-451-6983
        >
        > lnico...@interactions.com <mailto:lnico...@interactions.com>
        >
        > www.interactions.com <http://www.interactions.com>
        >
        >
        
*******************************************************************************
        >
        > This e-mail and any of its attachments may contain Interactions LLC
        proprietary information, which is privileged, confidential, or subject
        to copyright belonging to the Interactions LLC. This e-mail is
        intended solely for the use of the individual or entity to which it is
        addressed. If you are not the intended recipient of this e-mail, you
        are hereby notified that any dissemination, distribution, copying, or
        action taken in relation to the contents of and attachments to this
        e-mail is strictly prohibited and may be unlawful. If you have
        received this e-mail in error, please notify the sender immediately
        and permanently delete the original and any copy of this e-mail and
        any printout. Thank You.
        >
        >
        
*******************************************************************************



--
    *Lou Nicotra*

    IT Systems Engineer - SLT

    Interactions LLC

    o: 908-673-1833 <tel:781-405-5114>

    m: 908-451-6983 <tel:781-405-5114>

    _lnico...@interactions.com <mailto:lnico...@interactions.com>_

    www.interactions.com <http://www.interactions.com/>



--

*Lou Nicotra*

IT Systems Engineer - SLT

Interactions LLC

o: 908-673-1833 <tel:781-405-5114>

m: 908-451-6983 <tel:781-405-5114>

_lnico...@interactions.com <mailto:lnico...@interactions.com>_

www.interactions.com <http://www.interactions.com/>

*******************************************************************************

This e-mail and any of its attachments may contain Interactions LLC proprietary information, which is privileged, confidential, or subject to copyright belonging to the Interactions LLC. This e-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this e-mail is strictly prohibited and may be unlawful. If you have received this e-mail in error, please notify the sender immediately and permanently delete the original and any copy of this e-mail and any printout. Thank You.

*******************************************************************************


Reply via email to