Hi David, Baker D.J. <d.j.ba...@soton.ac.uk> writes:
> Hello, > > We are running Slurm 18.08.0 on our cluster and I am concerned that > Slurm appears to be using backfill scheduling excessively. In fact the > vast majority of jobs are being scheduled using backfill. So, for > example, I have just submitted a set of three serial jobs. They all > started on a compute node that was completely free, but > disconcertingly in the slurmctl log they were all reported as started > using backfill and that isn't making sense... > > [2018-11-20T12:31:27.598] backfill: Started JobId=217031 in batch on red158 > [2018-11-20T12:32:28.004] backfill: Started JobId=217032 in batch on red158 > [2018-11-20T12:33:58.608] backfill: Started JobId=217033 in batch on red158 > > I either don't understand the context of backfill re slurm or the > above is odd. Has anyone seem this "overuse" (unnecessary) use of > backfill on their cluster and/or could offer advice, please. I am not sure what "excessive backfilling" might mean. If you have a job which requires a large amount of resources to become available before it can start, then backfilling will allow other jobs with a lower priority to be run, if this can be achieved without delaying the start of the large job. So if a job needs 100 nodes, at some point 99 of them will be idle. Job which can start and finish before the 100th node becomes available will indeed be backfilled on empty nodes. This is how backfilling is supposed to work. Or am I misunderstanding your problem? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de