Chris Upon further testing this morning I see the job is assigned two different jobid's, something I wasn't expecting. This lead me down the road of thinking the output was incorrect.
Scontrol on a hetro job will show multi-jobids for the job. So, the output just wasn't what I was expecting. Jeff [jrlang@tlog1 TEST_CODE]$ sbatch check_nodes.sbatch Submitted batch job 2611773 [jrlang@tlog1 TEST_CODE]$ squeue | grep jrlang 2611773+1 teton CHECK_NO jrlang R 0:10 9 t[439-447] 2611773+0 teton-hug CHECK_NO jrlang R 0:10 1 thm03 [jrlang@tlog1 TEST_CODE]$ pestat | grep jrlang t439 teton alloc 32 32 0.02* 128000 119594 2611774 jrlang t440 teton alloc 32 32 0.02* 128000 119542 2611774 jrlang t441 teton alloc 32 32 0.01* 128000 119760 2611774 jrlang t442 teton alloc 32 32 0.01* 128000 121491 2611774 jrlang t443 teton alloc 32 32 0.02* 128000 119893 2611774 jrlang t444 teton alloc 32 32 0.02* 128000 119607 2611774 jrlang t445 teton alloc 32 32 0.03* 128000 119626 2611774 jrlang t446 teton alloc 32 32 0.01* 128000 119882 2611774 jrlang t447 teton alloc 32 32 0.01* 128000 120037 2611774 jrlang thm03 teton-hugemem mix 1 32 0.01* 1024000 1017845 2611773 jrlang [jrlang@tlog1 TEST_CODE]$ scontrol show job 2611773 JobId=2611773 PackJobId=2611773 PackJobOffset=0 JobName=CHECK_NODE PackJobIdSet=2611773-2611774 UserId=jrlang(10024903) GroupId=jrlang(10024903) MCS_label=N/A Priority=1004 Nice=0 Account=arcc QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:01:59 TimeLimit=01:00:00 TimeMin=N/A SubmitTime=2019-04-24T09:03:00 EligibleTime=2019-04-24T09:03:00 AccrueTime=2019-04-24T09:03:00 StartTime=2019-04-24T09:03:20 EndTime=2019-04-24T10:03:20 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-04-24T09:03:20 Partition=teton-hugemem AllocNode:Sid=tlog1:24498 ReqNodeList=(null) ExcNodeList=(null) NodeList=thm03 BatchHost=thm03 NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=1,mem=1000M,node=1,billing=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryCPU=1000M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/pfs/tsfs1/home/jrlang/TEST_CODE/check_nodes.sbatch WorkDir=/pfs/tsfs1/home/jrlang/TEST_CODE StdErr=/pfs/tsfs1/home/jrlang/TEST_CODE/slurm-2611773.out StdIn=/dev/null StdOut=/pfs/tsfs1/home/jrlang/TEST_CODE/slurm-2611773.out Power= JobId=2611774 PackJobId=2611773 PackJobOffset=1 JobName=CHECK_NODE PackJobIdSet=2611773-2611774 UserId=jrlang(10024903) GroupId=jrlang(10024903) MCS_label=N/A Priority=1086 Nice=0 Account=arcc QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:01:59 TimeLimit=01:00:00 TimeMin=N/A SubmitTime=2019-04-24T09:03:00 EligibleTime=2019-04-24T09:03:00 AccrueTime=2019-04-24T09:03:00 StartTime=2019-04-24T09:03:20 EndTime=2019-04-24T10:03:20 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-04-24T09:03:20 Partition=teton AllocNode:Sid=tlog1:24498 ReqNodeList=(null) ExcNodeList=(null) NodeList=t[439-447] BatchHost=t439 NumNodes=9 NumCPUs=288 NumTasks=288 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=288,mem=288000M,node=9,billing=288 Socks/Node=* NtasksPerN:B:S:C=32:0:*:* CoreSpec=* MinCPUsNode=32 MinMemoryCPU=1000M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/pfs/tsfs1/home/jrlang/TEST_CODE/check_nodes.sbatch WorkDir=/pfs/tsfs1/home/jrlang/TEST_CODE StdErr=/pfs/tsfs1/home/jrlang/TEST_CODE/slurm-2611774.out StdIn=/dev/null StdOut=/pfs/tsfs1/home/jrlang/TEST_CODE/slurm-2611774.out Power= -----Original Message----- From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Chris Samuel Sent: Tuesday, April 23, 2019 7:39 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] scontrol for a heterogenous job appears incorrect ◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources. On 23/4/19 3:02 pm, Jeffrey R. Lang wrote: > Looking at the nodelist and the NumNodes they are both incorrect. They > should show the first node and then the additional nodes assigned. You're only looking at the second of the two pack jobs for your submission, could they be assigned in the first one of the pack jobs instead? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA