Re: [slurm-users] Elastic Compute

2018-09-12 Thread Jacob Jenson
un into the same circumstances you have. > >> > >> -- > >> Brian D. Haymore > >> University of Utah > >> Center for High Performance Computing > >> 155 South 1452 East RM 405 > >> Salt Lake City, Ut 84112 > >> Phone: 801-558-1150, Fax: 801-585-5366 > >> http://bit.ly/1HO1N2C &

Re: [slurm-users] Elastic Compute

2018-09-12 Thread Eli V
t; University of Utah >> Center for High Performance Computing >> 155 South 1452 East RM 405 >> Salt Lake City, Ut 84112 >> Phone: 801-558-1150, Fax: 801-585-5366 >> http://bit.ly/1HO1N2C >> >> >> From: slurm

Re: [slurm-users] Elastic Compute

2018-09-11 Thread Felix Wolfheimer
84112 > Phone: 801-558-1150, Fax: 801-585-5366 > http://bit.ly/1HO1N2C > > > From: slurm-users [slurm-users-boun...@lists.schedmd.com] on behalf of > Chris Samuel [ch...@csamuel.org] > Sent: Monday, September 10, 2018 4:17 PM > To: s

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Brian Haymore
:17 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Elastic Compute On Tuesday, 11 September 2018 12:52:27 AM AEST Brian Haymore wrote: > I believe the default value of this would prevent jobs from sharing a node. But the jobs _do_ share a node when the resources become availa

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Chris Samuel
On Tuesday, 11 September 2018 12:52:27 AM AEST Brian Haymore wrote: > I believe the default value of this would prevent jobs from sharing a node. But the jobs _do_ share a node when the resources become available, it's just that the cloud part of Slurm is bringing up the wrong number of nodes c

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Brian Haymore
m>] Sent: Sunday, September 09, 2018 1:35 PM To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com> Subject: [slurm-users] Elastic Compute I'm using the SLURM Elastic Compute feature and it works great in general. However, I noticed that there's a bit of inefficienc

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Eli V
84112 >> Phone: 801-558-1150, Fax: 801-585-5366 >> http://bit.ly/1HO1N2C >> >> >> From: slurm-users [slurm-users-boun...@lists.schedmd.com] on behalf of Felix >> Wolfheimer [f.wolfhei...@googlemail.com] >> S

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Felix Wolfheimer
01-558-1150, Fax: 801-585-5366 > http://bit.ly/1HO1N2C > > > From: slurm-users [slurm-users-boun...@lists.schedmd.com] on behalf of > Felix Wolfheimer [f.wolfhei...@googlemail.com] > Sent: Sunday, September 09, 2018 1:35 PM > To: slurm-users@lists.schedmd.com > Subject: [slurm

Re: [slurm-users] Elastic Compute

2018-09-09 Thread Brian Haymore
From: slurm-users [slurm-users-boun...@lists.schedmd.com] on behalf of Felix Wolfheimer [f.wolfhei...@googlemail.com] Sent: Sunday, September 09, 2018 1:35 PM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Elastic Compute I'm using the SLURM Elastic Compute feature a

[slurm-users] Elastic Compute

2018-09-09 Thread Felix Wolfheimer
I'm using the SLURM Elastic Compute feature and it works great in general. However, I noticed that there's a bit of inefficiency in the decision about the number of nodes which SLURM creates. Let's say I've the following configuration NodeName=compute-[1-100] CPUs=10 State=CLOUD and there are non

Re: [slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-30 Thread Felix Wolfheimer
After a bit more testing I can answer my original question: I was just too impatient. When the ResumeProgram comes back with an exit code != 0 SLURM doesn't taint the node, i.e., it tries to start it again after a while. Exactly what I want! :-) @Lachlan Musicman: My slurm.conf Node and Partition

Re: [slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-28 Thread Lachlan Musicman
On 29 July 2018 at 04:32, Felix Wolfheimer wrote: > I'm experimenting with SLURM Elastic Compute on a cloud platform. I'm > facing the following situation: Let's say, SLURM requests that a compute > instance is started. The ResumeProgram tries to create the instance, but > doesn't succeed because

[slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-28 Thread Felix Wolfheimer
I'm experimenting with SLURM Elastic Compute on a cloud platform. I'm facing the following situation: Let's say, SLURM requests that a compute instance is started. The ResumeProgram tries to create the instance, but doesn't succeed because the cloud provider can't provide the instance type at this