Hi,

We would like to do over-subscription on a cluster that's running in the cloud.  The cluster dynamically spins up and down cpu nodes as needed.  What we see is that the least-loaded algorithm causes the maximum number of nodes specified in the partition to be spun up and each loaded with N jobs for the N cpu's in a node before it "doubles back" and starts over-subscribing.

What we actually want is for the /minimum /number of nodes to be used and for it to fully load (to the limit of the oversubscription setting) one node before starting up another. That is, we really want a "most-loaded" algorithm.  This would allow us to reduce the number of nodes we need to run and reduce costs.

Is there a way to get this behavior somehow?

Herc


Reply via email to