Hi,

We'd like to have just one of the partitions over subscribe the nodes in it.  The nodes are not shared with any other partitions.

The SLURM documentation (https://slurm.schedmd.com/cons_res_share.html) seems to indicate that the least-loaded algorithm is always used when oversubscribe=force.  I believe oversubscribe=force is what we want (but have it packeach  node fully first).

Thanks for pointing out the -m option.  Our jobs are separately sbatched.  So, unfortunately, I don't see we can use it in this case.

What we want to be able to do is on, say, a 4 core node run 8 (or 12) jobs.  But only do it for the nodes in this one partition. The other partitions should continue to run N jobs on an N core node.

Herc


<html style="direction: ltr;">   <head>     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">     <style id="bidiui-paragraph-margins" type="text/css">body p { margin-bottom: 0cm; margin-top: 0pt; } </style>   </head>   <body bidimailui-charset-is-forced="true" style="direction: ltr;">     <p>I could be missing something here, but if you refer to the <b>SelectTypeParameters=cr_lln       </b>you could just try cr_pack_nodes.</p>     <p><a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Pack_Nodes";>https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Pack_Nodes</a><br>     </p>     <p><br>     </p>     <p>If you want it on a per-partition configuration, I'm not sure       that's possible, you might need to set a distribution (-m) in your       job submit script/wrapper (E.g., -m block:*:*,pack)</p>     <p><a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/sbatch.html#OPT_distribution";>https://slurm.schedmd.com/sbatch.html#OPT_distribution</a><br>     </p>     <p><br>     </p>     <p>If you're referring to something else entirely, could you       elaborate on the least-loaded configuration in your setup?</p>     <p><br>     </p>     <p><br>       <b></b></p>     <div class="moz-cite-prefix">On 24/02/2022 23:35:30, Herc       Silverstein wrote:<br>     </div>     <blockquote type="cite"       cite="mid:3145b0e8-6ae0-f233-5080-36cdbba66...@schrodinger.com">       <meta http-equiv="content-type" content="text/html; charset=UTF-8">       <p>Hi,</p>       <p>We would like to do over-subscription on a cluster that's         running in the cloud.  The cluster dynamically spins up and down         cpu nodes as needed.  What we see is that the least-loaded         algorithm causes the maximum number of nodes specified in the         partition to be spun up and each loaded with N jobs for the N         cpu's in a node before it "doubles back" and starts         over-subscribing.</p>       <p>What we actually want is for the <i>minimum </i>number of         nodes to be used and for it to fully load (to the limit of the         oversubscription setting) one node before starting up another.         That is, we really want a "most-loaded" algorithm.  This would         allow us to reduce the number of nodes we need to run and reduce         costs.</p>       <p>Is there a way to get this behavior somehow?</p>       <p>Herc</p>       <p><br>       </p>       <p><br>       </p>     </blockquote>     <pre class="moz-signature" cols="72">-- Regards, Daniel Letai +972 (0)505 870 456</pre>   </body> </html>


Reply via email to