un into the same circumstances you have.
> >>
> >> --
> >> Brian D. Haymore
> >> University of Utah
> >> Center for High Performance Computing
> >> 155 South 1452 East RM 405
> >> Salt Lake City, Ut 84112
> >> Phone: 801-558-1150, Fax: 801-585-5366
> >> http://bit.ly/1HO1N2C
&
t; University of Utah
>> Center for High Performance Computing
>> 155 South 1452 East RM 405
>> Salt Lake City, Ut 84112
>> Phone: 801-558-1150, Fax: 801-585-5366
>> http://bit.ly/1HO1N2C
>>
>>
>> From: slurm
84112
> Phone: 801-558-1150, Fax: 801-585-5366
> http://bit.ly/1HO1N2C
>
>
> From: slurm-users [slurm-users-boun...@lists.schedmd.com] on behalf of
> Chris Samuel [ch...@csamuel.org]
> Sent: Monday, September 10, 2018 4:17 PM
> To: s
:17 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Elastic Compute
On Tuesday, 11 September 2018 12:52:27 AM AEST Brian Haymore wrote:
> I believe the default value of this would prevent jobs from sharing a node.
But the jobs _do_ share a node when the resources become availa
On Tuesday, 11 September 2018 12:52:27 AM AEST Brian Haymore wrote:
> I believe the default value of this would prevent jobs from sharing a node.
But the jobs _do_ share a node when the resources become available, it's just
that the cloud part of Slurm is bringing up the wrong number of nodes c
I believe the default value of this would prevent jobs from sharing a node.
You may want to look at this and change it from the default.
--
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112
Phone: 801-558-1150, Fax: 801-
I think you probably want CR_LLN set in your SelectTypeParameters in
slurm.conf. This makes it fill up a node before moving on to the next
instead of "striping" the jobs across the nodes.
On Mon, Sep 10, 2018 at 8:29 AM Felix Wolfheimer
wrote:
>
> No this happens without the "Oversubscribe" parame
No this happens without the "Oversubscribe" parameter being set. I'm using
custom resources though:
GresTypes=some_resource
NodeName=compute-[1-100] CPUs=10 Gres=some_resource:10 State=CLOUD
Submission uses:
sbatch --nodes=1 --ntasks-per-node=1 --gres=some_resource:1
But I just tried it withou
What do you have the OverSubscribe parameter set on the partition your using?
--
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112
Phone: 801-558-1150, Fax: 801-585-5366
http://bit.ly/1HO1N2C
After a bit more testing I can answer my original question: I was just
too impatient. When the ResumeProgram comes back with an exit code != 0
SLURM doesn't taint the node, i.e., it tries to start it again after a
while. Exactly what I want! :-)
@Lachlan Musicman: My slurm.conf Node and Partition
On 29 July 2018 at 04:32, Felix Wolfheimer
wrote:
> I'm experimenting with SLURM Elastic Compute on a cloud platform. I'm
> facing the following situation: Let's say, SLURM requests that a compute
> instance is started. The ResumeProgram tries to create the instance, but
> doesn't succeed because
11 matches
Mail list logo