[slurm-users] only 1 job running

Chandler Wed, 27 Jan 2021 21:31:03 -0800

Hi list, we have a new cluster setup with Bright cluster manager.  Looking into 
a support contract there, but trying to get community support in the mean time. 
 I'm sure things were working when the cluster was delivered, but I provisioned 
an additional node and now the scheduler isn't quite working right.


The new node I provisioned had slightly different disk layout, so had to 
provision it a bit differently from the other nodes.  I made some changes to 
the slurm queue as well, within the Bright cluser manager, to account for the 
additional resources, but I must've messed something up.  Now, only 1 job is 
running on the node, when there should be 16 running.  Further, there are no 
jobs running on the original nodes.

When I run squeue, this is the output:

             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
               156      defq cromwell smrtuser PD       0:00      1 (Resources)
               157      defq cromwell smrtuser PD       0:00      1 (Priority)
               ...
               203      defq cromwell smrtuser PD       0:00      1 (Priority)
               155      defq cromwell smrtuser  R      39:35      1 n010

Job 155 is running on the "new" node and as you can see the other jobs are 
stuck not running anywhere.  Once job 155 finishes the next one will start, so the queue 
is working although slowly.

I'm new to all this so not sure where to start troubleshooting.  I'd like to 
get these other jobs started so our task can be completed in a timely manner, 
and figure out why only 1 job is running when they all should be running.

Thanks
--
Chandler Sobel-Sorenson / Systems Administrator
Arizona Genomics Institute
University of Arizona

[slurm-users] only 1 job running

Reply via email to