Felix, Right now this would require Slurm code changes.
Jacob On Thu, Sep 13, 2018 at 12:10 AM, Felix Wolfheimer < f.wolfhei...@googlemail.com> wrote: > Thanks for the confirmation, Jacob. Is it possible to change this > behavior? If there's no config parameter for this, I'm fine with changing > the SLURM code to achieve this. It sounds like it'd be a very local change. > As for cloud setups it's a pretty common goal to minimize the number of > nodes, I'd also like to submit it as a feature request, then. > > Jacob Jenson <ja...@schedmd.com> schrieb am Mi., 12. Sep. 2018, 19:47: > >> Currently, Slurm marks allocated nodes needing to be booted as >> unavailable for other jobs until they are booted. Once the node is booted, >> then normal packing should happen. >> >> Jacob >> >> On Wed, Sep 12, 2018 at 7:30 AM, Eli V <eliven...@gmail.com> wrote: >> >>> Sound like you figured it out, but I mis-remembered and switched the >>> case on CR_LLN. Setting it spreads the jobs out across the nodes, not >>> filling one up first. Also, I believe it can be set per partition as >>> well. >>> On Tue, Sep 11, 2018 at 5:24 PM Felix Wolfheimer >>> <f.wolfhei...@googlemail.com> wrote: >>> > >>> > Thanks for the input! I tried a few more things but wasn't able to get >>> the behavior I want. >>> > Here's what I tried so far: >>> > - Set SelectTypeParameter to "CR_CPU,CR_LLN". >>> > - Set SelectTypeParameter to "CR_CPU,CR_Pack_Nodes". The documentation >>> for this parameter seems to described the behavior I want (pack jobs as >>> densely as possible on instances, i.e., minimize the number of instances). >>> > - Assign Weights to nodes as follows: >>> > NodeName=compute-X Weight=X >>> > >>> > The different configurations result all in the same behavior: If jobs >>> are coming in when the start of a node has been triggered, but the node is >>> not yet up and running, SLURM won't consider this resource but instead >>> triggers the creation of another node. As I'm expecting that this will >>> happen pretty regularly in the scenario I'm dealing with, that's kind of >>> critical for me. BTW: I'm using SLURM 18.08 and I restarted slurmctld after >>> each change in the configuration of course. >>> > >>> > Am Di., 11. Sep. 2018 um 00:33 Uhr schrieb Brian Haymore < >>> brian.haym...@utah.edu>: >>> >> >>> >> I re-read the docs and I was wrong on the default behavior. The >>> default is "no" which just means don't oversubcribe the individual >>> resources where I thought it was default to 'exclusive'. So I think I've >>> been taking us down a dead end in terms of what I thought might help. :\ >>> >> >>> >> >>> >> I have a system her that we are running with the elastic setup but >>> there we are doing exclusive (and it's sent that way in the conf) >>> scheduling so I've not run into the same circumstances you have. >>> >> >>> >> -- >>> >> Brian D. Haymore >>> >> University of Utah >>> >> Center for High Performance Computing >>> >> 155 South 1452 East RM 405 >>> >> Salt Lake City, Ut 84112 >>> >> Phone: 801-558-1150, Fax: 801-585-5366 >>> >> http://bit.ly/1HO1N2C >>> >> >>> >> ________________________________________ >>> >> From: slurm-users [slurm-users-boun...@lists.schedmd.com] on behalf >>> of Chris Samuel [ch...@csamuel.org] >>> >> Sent: Monday, September 10, 2018 4:17 PM >>> >> To: slurm-users@lists.schedmd.com >>> >> Subject: Re: [slurm-users] Elastic Compute >>> >> >>> >> On Tuesday, 11 September 2018 12:52:27 AM AEST Brian Haymore wrote: >>> >> >>> >> > I believe the default value of this would prevent jobs from sharing >>> a node. >>> >> >>> >> But the jobs _do_ share a node when the resources become available, >>> it's just >>> >> that the cloud part of Slurm is bringing up the wrong number of nodes >>> compared >>> >> to what it will actually use. >>> >> >>> >> -- >>> >> Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >>