Re: [slurm-users] Elastic Computeuest

Jacob Jenson Thu, 13 Sep 2018 09:14:31 -0700

Felix,

Right now this would require Slurm code changes.


Jacob

On Thu, Sep 13, 2018 at 12:10 AM, Felix Wolfheimer <
f.wolfhei...@googlemail.com> wrote:

> Thanks for the confirmation, Jacob. Is it possible to change this
> behavior? If there's no config parameter for this, I'm fine with changing
> the SLURM code to achieve this. It sounds like it'd be a very local change.
> As for cloud setups it's a pretty common goal to minimize the number of
> nodes, I'd also like to submit it as a feature request, then.
>
> Jacob Jenson <ja...@schedmd.com> schrieb am Mi., 12. Sep. 2018, 19:47:
>
>> Currently, Slurm marks allocated nodes needing to be booted as
>> unavailable for other jobs until they are booted. Once the node is booted,
>> then normal packing should happen.
>>
>> Jacob
>>
>> On Wed, Sep 12, 2018 at 7:30 AM, Eli V <eliven...@gmail.com> wrote:
>>
>>> Sound like you figured it out, but I mis-remembered and switched the
>>> case on CR_LLN. Setting it spreads the jobs out across the nodes, not
>>> filling one up first. Also, I believe it can be set per partition as
>>> well.
>>> On Tue, Sep 11, 2018 at 5:24 PM Felix Wolfheimer
>>> <f.wolfhei...@googlemail.com> wrote:
>>> >
>>> > Thanks for the input! I tried a few more things but wasn't able to get
>>> the behavior I want.
>>> >  Here's what I tried so far:
>>> > - Set SelectTypeParameter to "CR_CPU,CR_LLN".
>>> > - Set SelectTypeParameter to "CR_CPU,CR_Pack_Nodes". The documentation
>>> for this parameter seems to described the behavior I want (pack jobs as
>>> densely as possible on instances, i.e., minimize the number of instances).
>>> > - Assign Weights to nodes as follows:
>>> > NodeName=compute-X Weight=X
>>> >
>>> > The different configurations result all in the same behavior: If jobs
>>> are coming in when the start of a node has been triggered, but the node is
>>> not yet up and running, SLURM won't consider this resource but instead
>>> triggers the creation of another node. As I'm expecting that this will
>>> happen pretty regularly in the scenario I'm dealing with, that's kind of
>>> critical for me. BTW: I'm using SLURM 18.08 and I restarted slurmctld after
>>> each change in the configuration of course.
>>> >
>>> > Am Di., 11. Sep. 2018 um 00:33 Uhr schrieb Brian Haymore <
>>> brian.haym...@utah.edu>:
>>> >>
>>> >> I re-read the docs and I was wrong on the default behavior.  The
>>> default is "no" which just means don't oversubcribe the individual
>>> resources where I thought it was default to 'exclusive'.  So I think I've
>>> been taking us down a dead end in terms of what I thought might help. :\
>>> >>
>>> >>
>>> >> I have a system her that we are running with the elastic setup but
>>> there we are doing exclusive (and it's sent that way in the conf)
>>> scheduling so I've not run into the same circumstances you have.
>>> >>
>>> >> --
>>> >> Brian D. Haymore
>>> >> University of Utah
>>> >> Center for High Performance Computing
>>> >> 155 South 1452 East RM 405
>>> >> Salt Lake City, Ut 84112
>>> >> Phone: 801-558-1150, Fax: 801-585-5366
>>> >> http://bit.ly/1HO1N2C
>>> >>
>>> >> ________________________________________
>>> >> From: slurm-users [slurm-users-boun...@lists.schedmd.com] on behalf
>>> of Chris Samuel [ch...@csamuel.org]
>>> >> Sent: Monday, September 10, 2018 4:17 PM
>>> >> To: slurm-users@lists.schedmd.com
>>> >> Subject: Re: [slurm-users] Elastic Compute
>>> >>
>>> >> On Tuesday, 11 September 2018 12:52:27 AM AEST Brian Haymore wrote:
>>> >>
>>> >> > I believe the default value of this would prevent jobs from sharing
>>> a node.
>>> >>
>>> >> But the jobs _do_ share a node when the resources become available,
>>> it's just
>>> >> that the cloud part of Slurm is bringing up the wrong number of nodes
>>> compared
>>> >> to what it will actually use.
>>> >>
>>> >> --
>>> >>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>>
>>>
>>

Re: [slurm-users] Elastic Computeuest

Reply via email to