Hi D.J., I noticed you have:
PriorityFlags=DEPTH_OBLIVIOUS,SMALL_RELATIVE_TO_TIME,FAIR_TREE I'm pretty sure it does not makes sense to have depth oblivious, and fair tree set at the same time. You'll want to choose one of them. That’s not going to be reason for the issue however, but you are likely not running the fairshare algorithm that was intended. "My colleague is from a Moab background, and in that respect he was > surprised not to see nodes being reserved for jobs, but it could be > that Slurm works in a different way to try to make efficient use of > the cluster by backfilling more aggressively than Moab." Slurm unfortunately does not indicate when nodes are being put aside for large jobs. I wish that it did. Nodes will instead be in "idle" state when prepping for a large job. To increase the possibility of more whole nodes being available for large MPI jobs to get them started faster, you might consider the following parameters: SelectTypeParameters=CR_Pack_Nodes And SchedulerParameters=pack_serial_at_end, bf_busy_nodes Also, as Loris pointed out, bf_window will need to be set to the max wall time in minutes. Best, Chris — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 1/9/19, 11:52 PM, "slurm-users on behalf of Loris Bennett" <slurm-users-boun...@lists.schedmd.com on behalf of loris.benn...@fu-berlin.de> wrote: Hi David, If your maximum run-time is more than the 2 1/2 days (3600 minutes) you have set for bf_window, you might need to increase bf_window accordingly. See the description here: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsched_config.html&data=02%7C01%7Cchris.coffey%40nau.edu%7Cc83886a4754c403440c408d676c828f2%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C636826999384921564&sdata=ONaDCNBDktzNLoJDxBtDkz9g9J4XQr1chN6ijaF0vQg%3D&reserved=0 Cheers, Loris Baker D.J. <d.j.ba...@soton.ac.uk> writes: > Hello, > > A colleague intimated that he thought that larger jobs were tending to > get starved out on our slurm cluster. It's not a busy time at the > moment so it's difficult to test this properly. Back in November it > was not completely unusual for a larger job to have to wait up to a > week to start. > > I've extracted the key scheduling configuration out of the slurm.conf > and I would appreciate your comments, please. Even at the busiest of > times we notice many single compute jobs executing on the cluster -- > starting either via the scheduler or by backfill. > > Looking at the scheduling configuration do you think that I'm > favouring small jobs too much? That is, for example, should I increase > the PriorityWeightJobSize to encourage larger jobs to run? > > I was very keen not to starve out small/medium jobs, however perhaps > there is too much emphasis on small/medium jobs in our setup. > > My colleague is from a Moab background, and in that respect he was > surprised not to see nodes being reserved for jobs, but it could be > that Slurm works in a different way to try to make efficient use of > the cluster by backfilling more aggressively than Moab. Certainly we > see a great deal of activity from backfill. > > In this respect does anyone understand the mechanism used to reserve > nodes/resources for jobs in slurm or potentially where to look for > that type of information. > > Best regards, > David > > SchedulerType=sched/backfill > SchedulerParameters=bf_window=3600,bf_resolution=180,bf_max_job_user=4 > > SelectType=select/cons_res > SelectTypeParameters=CR_Core > FastSchedule=1 > PriorityFavorSmall=NO > PriorityFlags=DEPTH_OBLIVIOUS,SMALL_RELATIVE_TO_TIME,FAIR_TREE > PriorityType=priority/multifactor > PriorityDecayHalfLife=14-0 > > PriorityWeightFairshare=1000000 > PriorityWeightAge=100000 > PriorityWeightPartition=0 > PriorityWeightJobSize=100000 > PriorityWeightQOS=10000 > PriorityMaxAge=7-0 > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de