Re: [slurm-users] Jobs blocking scheduling progress

2018-07-04 Thread Yair Yarom
Hi, As Paul mentioned, we once encountered a starvation issue with the backfill algorithm and since set up the bf_window to match the maximum running time of all the partitions. This could be the case here. Also make sure that indeed the jobs can run on the non-gpu nodes (we constantly encounter

Re: [slurm-users] Jobs blocking scheduling progress

2018-07-03 Thread Paul Edmon
Odds are the backfill loop is not penetrating far enough into the queue.  Recall that slurm has two scheduling loops.  The primary is the faster one that only penetrates as far as it can schedule. Thus in this case the primary loop would stop immediately on the GPU jobs that it can't schedule. 

[slurm-users] Jobs blocking scheduling progress

2018-07-03 Thread Christopher Benjamin Coffey
Hello! We are having an issue with high priority gpu jobs blocking low priority cpu only jobs. Our cluster is setup with one partition, "all". All nodes reside in this cluster. In this all partition we have four generations of compute nodes, including gpu nodes. We do this to make use of those