Hello,
We see the following issue with smaller jobs pushing back large jobs. We are using slurm 19.05.8 so not sure if this is patched in newer releases. With a 4 node test partition I submit 3 jobs as 2 users ssh hpcdev1@navy51 'sbatch --nodes=3 --ntasks-per-node=40 --partition=backfilltest --time=120 --wrap="sleep 7200"' ssh hpcdev2@navy51 'sbatch --nodes=4 --ntasks-per-node=40 --partition=backfilltest --time=60 --wrap="sleep 3600"' ssh hpcdev2@navy51 'sbatch --nodes=4 --ntasks-per-node=40 --partition=backfilltest --time=60 --wrap="sleep 3600"' Then I increase the priority of the pending jobs significantly. Reading the manual, my understanding is that nodes job should be held for these jobs. for job in $(squeue -h -p backfilltest -t pd -o %i); do scontrol update job ${job} priority=1000000000;done squeue -p backfilltest -o "%i | %u | %C | %Q | %l | %S | %T" JOBID | USER | CPUS | PRIORITY | TIME_LIMIT | START_TIME | STATE 28482 | hpcdev2 | 160 | 1000000000 | 1:00:00 | N/A | PENDING 28483 | hpcdev2 | 160 | 1000000000 | 1:00:00 | N/A | PENDING 28481 | hpcdev1 | 120 | 50083 | 2:00:00 | 2020-12-08T09:44:15 | RUNNING So, there is one node free in our 4 node partition. Naturally, a small job with a walltime of less than 1 hour could run in that but we are also seeing backfill start longer jobs. backfilltest up 2-12:00:00 3 alloc reddev[001-003] backfilltest up 2-12:00:00 1 idle reddev004 ssh hpcdev3@navy51 'sbatch --nodes=1 --ntasks-per-node=40 --partition=backfilltest --time=720 --wrap="sleep 432000"' squeue -p backfilltest -o "%i | %u | %C | %Q | %l | %S | %T" JOBID | USER | CPUS | PRIORITY | TIME_LIMIT | START_TIME | STATE 28482 | hpcdev2 | 160 | 1000000000 | 1:00:00 | N/A | PENDING 28483 | hpcdev2 | 160 | 1000000000 | 1:00:00 | N/A | PENDING 28481 | hpcdev1 | 120 | 50083 | 2:00:00 | 2020-12-08T09:44:15 | RUNNING 28484 | hpcdev3 | 40 | 37541 | 12:00:00 | 2020-12-08T09:54:48 | RUNNING Is this expect behaviour? It is also weird that the pending jobs don't have a start time. I have increased the backfill parameters significantly, but it doesn't seem to affect this at all. SchedulerParameters=bf_window=14400,bf_resolution=2400,bf_max_job_user=80,bf_continue,default_queue_depth=1000,bf_interval=60 Best regards, David