Please post the output of 'scontrol show job 1908239', and also the
output of 'scontrol show node' for one of the idle compute nodes.
Prentice
On 3/19/21 8:12 PM, Bernstein, Noam CIV USN NRL (6393) Washington DC
(USA) wrote:
Can anyone explain why job 1908239 is not running, or what else I can
check? squeue says "Resources", and start time is always right now,
no matter when I run "squeue --start", but the resources are available
according to "sinfo ... state=idle". It's only a 1 minute job, so
it's not because the nodes won't be available for long enough to be
backfilled.
slurm version is admittedly a bit old, 19.05.7
> squeue -p n2019 --state=PD -l
Fri Mar 19 20:09:17 2021
JOBID PARTITION NAME USER STATE TIME
TIME_LIMI NODES NODELIST(REASON)
1908239 n2019 LiCu_SPA bernstei PENDING
0:00 1:00 1 (Resources)
1908236 n2019 cspbbr3- jllyons PENDING 0:00
2-16:00:00 2 (Priority)
1908227 n2019 Cy3_dupl yckim PENDING 0:00
33-08:00:00 4 (Priority)
1908231 n2019,n20 sGC_Fe_N bernstei PENDING 0:00
7-00:00:00 4 (JobHeldUser)
1908238 n2019 LiCu_SPA bernstei PENDING
0:00 1:00:00 1 (JobHeldUser)
> squeue -j 1908239 --start
JOBID PARTITION NAME USER ST
START_TIME NODES SCHEDNODES NODELIST(REASON)
1908239 n2019 LiCu_SPA bernstei PD
2021-03-19T20:09:17 1 compute-4-[18-19] (Resources)
> sinfo -p n2019 state=idle
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
n2019 up infinite 43 alloc
compute-4-[0-11,13-17,20-26,28-39,41-47]
n2019 up infinite 5 idle compute-4-[12,18-19,27,40]