I am in the beginning of setting up my first SLURM cluster and
I am trying to understand why jobs are pending when resources are available

These are the pending jobs:

# squeue -P --sort=-p,i --states=PD -O "JobID:.12 ,Partition:9 ,StateCompact:2 ,Priority:.12 ,ReasonList"
       JOBID PARTITION ST     PRIORITY NODELIST(REASON)
       38692 rtx8000   PD 0.0046530945 (Resources)
       38693 rtx8000   PD 0.0046530945 (Priority)
       38694 rtx8000   PD 0.0046530906 (Priority)
       38695 rtx8000   PD 0.0046530866 (Priority)
       38696 rtx8000   PD 0.0046530866 (Priority)
       38697 rtx8000   PD 0.0000208867 (Priority)

The job at the top is as follows:

Submission command line:

  sbatch -p rtx8000 -G 1 -c 4 -t 12:00:00 --mem=47G \
   -o /cluster/batch/iman/%j.out --wrap='cmd .....'

# scontrol show job=38692
JobId=38692 JobName=wrap
   UserId=iman(8084) GroupId=iman(8084) MCS_label=N/A
   Priority=19989863 Nice=0 Account=imanlab QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=12:00:00 TimeMin=N/A
   SubmitTime=2021-01-21T13:05:02 EligibleTime=2021-01-21T13:05:02
   AccrueTime=2021-01-21T13:05:02
   StartTime=2021-01-22T01:05:02 EndTime=2021-01-22T13:05:02 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-01-21T14:04:32
   Partition=rtx8000 AllocNode:Sid=mlsc-head:974529
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null) SchedNodeList=rtx-06
   NumNodes=1-1 NumCPUs=4 NumTasks=1 CPUs/Task=4 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,mem=47G,node=1,billing=8,gres/gpu=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
   MinCPUsNode=4 MinMemoryNode=47G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/autofs/homes/008/iman
   StdErr=/cluster/batch/iman/38692.out
   StdIn=/dev/null
   StdOut=/cluster/batch/iman/38692.out
   Power=
   TresPerJob=gpu:1
   MailUser=(null) MailType=NONE

This node shows it has enough free resources (cpu,mem,gpus) for
the job in the partition

# scontrol show node=rtx-06
NodeName=rtx-06 Arch=x86_64 CoresPerSocket=16
   CPUAlloc=16 CPUTot=32 CPULoad=5.77
   AvailableFeatures=intel,cascade,rtx8000
   ActiveFeatures=intel,cascade,rtx8000
   Gres=gpu:quadro_rtx_8000:10(S:0)
   NodeAddr=rtx-06 NodeHostName=rtx-06 Version=20.02.3
   OS=Linux 4.18.0-193.28.1.el8_2.x86_64 #1 SMP Thu Oct 22 00:20:22 UTC 2020
   RealMemory=1546000 AllocMem=146432 FreeMem=1420366 Sockets=2 Boards=1
   MemSpecLimit=2048
   State=MIXED ThreadsPerCore=1 TmpDisk=6000000 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=rtx8000
   BootTime=2020-12-30T10:35:34 SlurmdStartTime=2020-12-30T10:37:21
   CfgTRES=cpu=32,mem=1546000M,billing=99,gres/gpu=10
   AllocTRES=cpu=16,mem=143G,gres/gpu=5
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

# squeue --partition=rtx8000 --states=R -O "NodeList:10 ,JobID:.8 ,Partition:10,tres-alloc,tres-per-job" -w rtx-06
NODELIST      JOBID PARTITION  TRES_ALLOC           TRES_PER_JOB
rtx-06        38687 rtx8000    cpu=4,mem=47G,node=1 gpu:1
rtx-06        37267 rtx8000    cpu=3,mem=24G,node=1 gpu:1
rtx-06        37495 rtx8000    cpu=3,mem=24G,node=1 gpu:1
rtx-06        38648 rtx8000    cpu=3,mem=24G,node=1 gpu:1
rtx-06        38646 rtx8000    cpu=3,mem=24G,node=1 gpu:1

In case this is needed

# scontrol show part=rtx8000
PartitionName=rtx8000
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=04:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=rtx-[04-08]
   PriorityJobFactor=1 PriorityTier=4 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=160 TotalNodes=5 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
   TRESBillingWeights=CPU=1.24,Mem=0.02G,Gres/gpu=3.0


Scheduling parameters from slurm.conf are:

EnforcePartLimits=ALL
LaunchParameters=mem_sort,slurmstepd_memlock_all,test_exec
MaxJobCount=300000
MaxArraySize=10000
DefMemPerCPU=10240
DefCpuPerGPU=1
DefMemPerGPU=10240
GpuFreqDef=medium
CompleteWait=0
EpilogMsgTime=3000000
InactiveLimit=60
KillWait=30
UnkillableStepTimeout=180
ResvOverRun=UNLIMITED
MinJobAge=600
Waittime=5
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE
PreemptType=preempt/partition_prio
PreemptMode=REQUEUE

SchedulerParameters=\
default_queue_depth=1500,\
partition_job_depth=10,\
bf_continue,\
bf_interval=30,\
bf_resolution=600,\
bf_window=11520,\
bf_max_job_part=0,\
bf_max_job_user=10,\
bf_max_job_test=100000,\
bf_max_job_start=1000,\
bf_ignore_newly_avail_nodes,\
enable_user_top,\
pack_serial_at_end,\
nohold_on_prolog_fail,\
permit_job_expansion,\
preempt_strict_order,\
preempt_youngest_first,\
reduce_completing_frag,\
max_rpc_cnt=16

DependencyParameters=kill_invalid_depend


So any idea why job 38687 is not being run on the rtx-06 node

---------------------------------------------------------------
Paul Raines                     http://help.nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street     Charlestown, MA 02129            USA




Reply via email to