I could be missing something here, but if you refer to the SelectTypeParameters=cr_lln you could just try cr_pack_nodes.

https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Pack_Nodes


If you want it on a per-partition configuration, I'm not sure that's possible, you might need to set a distribution (-m) in your job submit script/wrapper (E.g., -m block:*:*,pack)

https://slurm.schedmd.com/sbatch.html#OPT_distribution


If you're referring to something else entirely, could you elaborate on the least-loaded configuration in your setup?



On 24/02/2022 23:35:30, Herc Silverstein wrote:

Hi,

We would like to do over-subscription on a cluster that's running in the cloud.  The cluster dynamically spins up and down cpu nodes as needed.  What we see is that the least-loaded algorithm causes the maximum number of nodes specified in the partition to be spun up and each loaded with N jobs for the N cpu's in a node before it "doubles back" and starts over-subscribing.

What we actually want is for the minimum number of nodes to be used and for it to fully load (to the limit of the oversubscription setting) one node before starting up another.  That is, we really want a "most-loaded" algorithm.  This would allow us to reduce the number of nodes we need to run and reduce costs.

Is there a way to get this behavior somehow?

Herc



-- 
Regards,

Daniel Letai
+972 (0)505 870 456


Reply via email to