Re: [slurm-users] Backfill isn’t working for a node with two GPUs that have different GRES types.

Marcus Wagner Wed, 03 Apr 2019 01:51:10 -0700

Hmm...,

I'm a bit dazzled, seems to be ok as far as I can tell.


Did you try to restart slurmctld?

I had a case, where users could not submit to the default partitionanymore, since SLURM told them (if I remember right)

wrong account/partition combination
or something like that.

My first suspicion was my submission script since I changed it recently,but I could not find any error. scontrol reconfig did not help.

But everything went well again, after I restarted the slurmctld.

Might be worth a try.


Best
Marcus

On 4/2/19 1:24 PM, Randall Radmer wrote:

Hi Marcus,

Following jobs are running or pending after I killed job 100816, whichwas running on computelab-134's T4:

100815 RUNNING computelab-134 gpu:gv100:1 None1
100817 PENDING gpu:gv100:1 Resources1
100818 PENDING gpu:tu104:1 Resources1

$ scontrol -d show node computelab-134
NodeName=computelab-134 Arch=x86_64 CoresPerSocket=6
   CPUAlloc=6 CPUErr=0 CPUTot=12 CPULoad=0.00
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=gpu:gv100:1,gpu:tu104:1
   GresDrain=N/A
 GresUsed=gpu:gv100:1(IDX:0),gpu:tu104:0(IDX:N/A)
   NodeAddr=computelab-134 NodeHostName=computelab-134 Version=17.11
   OS=Linux 4.4.0-143-generic #169-Ubuntu SMP Thu Feb 7 07:56:38 UTC 2019
   RealMemory=64307 AllocMem=32148 FreeMem=61126 Sockets=2 Boards=1

State=MIXED ThreadsPerCore=1 TmpDisk=404938 Weight=1 Owner=N/AMCS_label=N/A

   Partitions=test-backfill
   BootTime=2019-03-29T12:09:25 SlurmdStartTime=2019-04-01T11:34:35
 
CfgTRES=cpu=12,mem=64307M,billing=12,gres/gpu=2,gres/gpu:gv100=1,gres/gpu:tu104=1
 AllocTRES=cpu=6,mem=32148M,gres/gpu=1,gres/gpu:gv100=1
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

$ scontrol -d show job 100815
JobId=100815 JobName=bash
   UserId=rradmer(27578) GroupId=hardware(30) MCS_label=N/A
   Priority=1 Nice=0 Account=cag QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:06:45 TimeLimit=02:00:00 TimeMin=N/A
   SubmitTime=2019-04-02T05:13:05 EligibleTime=2019-04-02T05:13:05
   StartTime=2019-04-02T05:13:05 EndTime=2019-04-02T07:13:05 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2019-04-02T05:13:05
   Partition=test-backfill AllocNode:Sid=computelab-frontend-02:7873
   ReqNodeList=computelab-134 ExcNodeList=(null)
   NodeList=computelab-134
   BatchHost=computelab-134
   NumNodes=1 NumCPUs=6 NumTasks=1 CPUs/Task=6 ReqB:S:C:T=0:0:*:*
 TRES=cpu=6,mem=32148M,node=1,billing=6,gres/gpu=1,gres/gpu:gv100=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
     Nodes=computelab-134 CPU_IDs=0-5 Mem=32148 GRES_IDX=gpu:gv100(IDX:0)
   MinCPUsNode=6 MinMemoryNode=32148M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=gpu:gv100:1 Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/bin/bash
   WorkDir=/home/rradmer
   Power=

$ scontrol -d show job 100817
JobId=100817 JobName=bash
   UserId=rradmer(27578) GroupId=hardware(30) MCS_label=N/A
   Priority=1 Nice=0 Account=cag QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=02:00:00 TimeMin=N/A
   SubmitTime=2019-04-02T05:13:11 EligibleTime=2019-04-02T05:13:11
   StartTime=2019-04-02T07:13:05 EndTime=2019-04-02T09:13:05 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2019-04-02T05:20:44
   Partition=test-backfill AllocNode:Sid=computelab-frontend-03:21736
   ReqNodeList=computelab-134 ExcNodeList=(null)
   NodeList=(null) SchedNodeList=computelab-134
   NumNodes=1-1 NumCPUs=6 NumTasks=1 CPUs/Task=6 ReqB:S:C:T=0:0:*:*
 TRES=cpu=6,mem=32148M,node=1,gres/gpu=1,gres/gpu:gv100=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=6 MinMemoryNode=32148M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=gpu:gv100:1 Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/bin/bash
   WorkDir=/home/rradmer
   Power=

$ scontrol -d show job 100818
JobId=100818 JobName=bash
   UserId=rradmer(27578) GroupId=hardware(30) MCS_label=N/A
   Priority=1 Nice=0 Account=cag QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=02:00:00 TimeMin=N/A
   SubmitTime=2019-04-02T05:13:12 EligibleTime=2019-04-02T05:13:12
   StartTime=2019-04-02T09:13:00 EndTime=2019-04-02T11:13:00 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2019-04-02T05:21:32
   Partition=test-backfill AllocNode:Sid=computelab-frontend-02:12826
   ReqNodeList=computelab-134 ExcNodeList=(null)
   NodeList=(null) SchedNodeList=computelab-134
   NumNodes=1-1 NumCPUs=6 NumTasks=1 CPUs/Task=6 ReqB:S:C:T=0:0:*:*
 TRES=cpu=6,mem=32148M,node=1,gres/gpu=1,gres/gpu:tu104=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=6 MinMemoryNode=32148M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=gpu:tu104:1 Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/bin/bash
   WorkDir=/home/rradmer
   Power=

On Mon, Apr 1, 2019 at 11:24 PM Marcus Wagner<wag...@itc.rwth-aachen.de <mailto:wag...@itc.rwth-aachen.de>> wrote:


    Dear Randall,

    could you please also provide


    scontrol -d show node computelab-134
    scontrol -d show job 100091
    scontrol -d show job 100094


    Best
    Marcus

    On 4/1/19 4:31 PM, Randall Radmer wrote:


    I can’t get backfill to work for a machine with two GPUs (one is
    a P4 and the other a T4).

    Submitting jobs works as expected: if the GPU I request is free,
    then my job runs, otherwise it goes into a pending state.  But if
    I have pending jobs for one GPU ahead of pending jobs for the
    other GPU, I see blocking issues.


    More specifically, I can create a case where I am running a job
    on each of the GPUs and have a pending job waiting for the P4
    followed by a pending job waiting for a T4.  I would expect that
    if I exit the running T4 job, then backfill would start the
    pending T4 job, even though it has to job ahead of the pending P4
    job. This does not happen...


    The following shows my jobs after I exited from a running T4 job,
    which had ID 100092:

    $ squeue --noheader -u rradmer
    --Format=jobid,state,gres,nodelist,reason | sed 's/  */ /g' | sort

    100091 RUNNING gpu:gv100:1 computelab-134 None

    100093 PENDING gpu:gv100:1 Resources

    100094 PENDING gpu:tu104:1 Resources


    I can find no reason why 100094  doesn’t start running (I’ve
    waited up to an hour, just to make sure).


    System config info and log snippets shown below.


    Thanks much,

    Randy


    Node state corresponding to the squeue command, shown above:

    $ scontrol show node computelab-134 | grep -i [gt]res

      Gres=gpu:gv100:1,gpu:tu104:1

      
CfgTRES=cpu=12,mem=64307M,billing=12,gres/gpu=2,gres/gpu:gv100=1,gres/gpu:tu104=1

      AllocTRES=cpu=6,mem=32148M,gres/gpu=1,gres/gpu:gv100=1



    Slurm config follows:

    $ scontrol show conf | grep -Ei '(gres|^Sched|prio|vers)'

    AccountingStorageTRES =
    
cpu,mem,energy,node,billing,gres/gpu,gres/gpu:gp100,gres/gpu:gp104,gres/gpu:gv100,gres/gpu:tu102,gres/gpu:tu104,gres/gpu:tu106

    GresTypes               = gpu

    PriorityParameters      = (null)

    PriorityDecayHalfLife   = 7-00:00:00

    PriorityCalcPeriod      = 00:05:00

    PriorityFavorSmall      = No

    PriorityFlags           =

    PriorityMaxAge          = 7-00:00:00

    PriorityUsageResetPeriod = NONE

    PriorityType            = priority/multifactor

    PriorityWeightAge       = 0

    PriorityWeightFairShare = 0

    PriorityWeightJobSize   = 0

    PriorityWeightPartition = 0

    PriorityWeightQOS       = 0

    PriorityWeightTRES      = (null)

    PropagatePrioProcess    = 0

    SchedulerParameters     =
    
default_queue_depth=2000,bf_continue,bf_ignore_newly_avail_nodes,bf_max_job_test=1000,bf_window=10080,kill_invalid_depend

    SchedulerTimeSlice      = 30 sec

    SchedulerType           = sched/backfill

    SLURM_VERSION           = 17.11.9-2


    GPUs on node:

    $ nvidia-smi --query-gpu=index,name,gpu_bus_id --format=csv

    index, name, pci.bus_id

    0, Tesla T4, 00000000:82:00.0

    1, Tesla P4, 00000000:83:00.0

    The gres file on node:

    $ cat /etc/slurm/gres.conf

    Name=gpu Type=tu104 File=/dev/nvidia0 Cores=0,1,2,3,4,5

    Name=gpu Type=gp104 File=/dev/nvidia1 Cores=6,7,8,9,10,11


    Random sample of SlurmSchedLogFile:

    $ sudo tail -3 slurm.sched.log

    [2019-04-01T08:14:23.727] sched: Running job scheduler

    [2019-04-01T08:14:23.728] sched: JobId=100093. State=PENDING.
    Reason=Resources. Priority=1. Partition=test-backfill.

    [2019-04-01T08:14:23.728] sched: JobId=100094. State=PENDING.
    Reason=Resources. Priority=1. Partition=test-backfill.


    Random sample of SlurmctldLogFile:

    $ sudo grep backfill slurmctld.log  | tail -5

    [2019-04-01T08:16:53.281] backfill: beginning

    [2019-04-01T08:16:53.281] backfill test for JobID=100093 Prio=1
    Partition=test-backfill

    [2019-04-01T08:16:53.281] backfill test for JobID=100094 Prio=1
    Partition=test-backfill

    [2019-04-01T08:16:53.281] backfill: reached end of job queue

    [2019-04-01T08:16:53.281] backfill: completed testing 2(2) jobs,
    usec=707

--Marcus Wagner, Dipl.-Inf.


    IT Center
    Abteilung: Systeme und Betrieb
    RWTH Aachen University
    Seffenter Weg 23
    52074 Aachen
    Tel: +49 241 80-24383
    Fax: +49 241 80-624383
    wag...@itc.rwth-aachen.de  <mailto:wag...@itc.rwth-aachen.de>
    www.itc.rwth-aachen.de  <http://www.itc.rwth-aachen.de>


--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Re: [slurm-users] Backfill isn’t working for a node with two GPUs that have different GRES types.

Reply via email to