Hi,
We'd like to have just one of the partitions over subscribe the nodes in
it. The nodes are not shared with any other partitions.
The SLURM documentation (https://slurm.schedmd.com/cons_res_share.html)
seems to indicate that the least-loaded algorithm is always used when
oversubscribe=force. I believe oversubscribe=force is what we want (but
have it packeach node fully first).
Thanks for pointing out the -m option. Our jobs are separately
sbatched. So, unfortunately, I don't see we can use it in this case.
What we want to be able to do is on, say, a 4 core node run 8 (or 12)
jobs. But only do it for the nodes in this one partition. The other
partitions should continue to run N jobs on an N core node.
Herc
<html style="direction: ltr;"> <head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style id="bidiui-paragraph-margins" type="text/css">body p { margin-bottom: 0cm; margin-top: 0pt; } </style>
</head>
<body bidimailui-charset-is-forced="true" style="direction: ltr;">
<p>I could be missing something here, but if you refer to the <b>SelectTypeParameters=cr_lln
</b>you could just try cr_pack_nodes.</p>
<p><a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Pack_Nodes">https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Pack_Nodes</a><br>
</p> <p><br> </p>
<p>If you want it on a per-partition configuration, I'm not sure
that's possible, you might need to set a distribution (-m) in your
job submit script/wrapper (E.g., -m block:*:*,pack)</p>
<p><a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/sbatch.html#OPT_distribution">https://slurm.schedmd.com/sbatch.html#OPT_distribution</a><br>
</p> <p><br> </p>
<p>If you're referring to something else entirely, could you
elaborate on the least-loaded configuration in your setup?</p>
<p><br> </p> <p><br> <b></b></p>
<div class="moz-cite-prefix">On 24/02/2022 23:35:30, Herc
Silverstein wrote:<br> </div> <blockquote type="cite"
cite="mid:3145b0e8-6ae0-f233-5080-36cdbba66...@schrodinger.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<p>Hi,</p>
<p>We would like to do over-subscription on a cluster that's
running in the cloud. The cluster dynamically spins up and down
cpu nodes as needed. What we see is that the least-loaded
algorithm causes the maximum number of nodes specified in the
partition to be spun up and each loaded with N jobs for the N
cpu's in a node before it "doubles back" and starts
over-subscribing.</p>
<p>What we actually want is for the <i>minimum </i>number of
nodes to be used and for it to fully load (to the limit of the
oversubscription setting) one node before starting up another.Â
That is, we really want a "most-loaded" algorithm. This would
allow us to reduce the number of nodes we need to run and reduce
costs.</p>
<p>Is there a way to get this behavior somehow?</p>
<p>Herc</p> <p><br> </p> <p><br> </p>
</blockquote> <pre class="moz-signature" cols="72">-- Regards,
Daniel Letai +972 (0)505 870 456</pre> </body> </html>