[slurm-users] Power/Cloud Plugin - Race Condition after Node Start - Wrong Job State

2019-09-02 Thread Felix Wolfheimer
Just stumbled on an issue which kicks in occasionally when Slurm starts/creates instances using the power/cloud plugin. Here is what happens: I'm using the Slurm Power/Cloud plugin to create compute instances on demand. Occasionally it happens that I run into the following situation when new inst

Re: [slurm-users] Elastic Computeuest

2018-09-12 Thread Felix Wolfheimer
it can be set per partition as >> well. >> On Tue, Sep 11, 2018 at 5:24 PM Felix Wolfheimer >> wrote: >> > >> > Thanks for the input! I tried a few more things but wasn't able to get >> the behavior I want. >> > Here's what

Re: [slurm-users] Elastic Compute

2018-09-11 Thread Felix Wolfheimer
Thanks for the input! I tried a few more things but wasn't able to get the behavior I want. Here's what I tried so far: - Set SelectTypeParameter to "CR_CPU,CR_LLN". - Set SelectTypeParameter to "CR_CPU,CR_Pack_Nodes". The documentation for this parameter seems to described the behavior I want (pa

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Felix Wolfheimer
01-558-1150, Fax: 801-585-5366 > http://bit.ly/1HO1N2C > > > From: slurm-users [slurm-users-boun...@lists.schedmd.com] on behalf of > Felix Wolfheimer [f.wolfhei...@googlemail.com] > Sent: Sunday, September 09, 2018 1:35 PM > To: slurm-users@lists.schedmd.com > Subject: [slurm

[slurm-users] Elastic Compute

2018-09-09 Thread Felix Wolfheimer
I'm using the SLURM Elastic Compute feature and it works great in general. However, I noticed that there's a bit of inefficiency in the decision about the number of nodes which SLURM creates. Let's say I've the following configuration NodeName=compute-[1-100] CPUs=10 State=CLOUD and there are non

Re: [slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-30 Thread Felix Wolfheimer
to the instance which contains the NodeName, such that I can find it easily when SLURM calls the SuspendProgram to terminate the node. Lachlan Musicman schrieb am So., 29. Juli 2018, 04:02: > On 29 July 2018 at 04:32, Felix Wolfheimer > wrote: > >> I'm experimenting with SLUR

[slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-28 Thread Felix Wolfheimer
I'm experimenting with SLURM Elastic Compute on a cloud platform. I'm facing the following situation: Let's say, SLURM requests that a compute instance is started. The ResumeProgram tries to create the instance, but doesn't succeed because the cloud provider can't provide the instance type at this

[slurm-users] SLURM Elastic Compute - Unable to determine this node's NodeName

2018-07-21 Thread Felix Wolfheimer
urmctld on the command line of slurmd on the node. This works fine. -- Forwarded message ----- From: Felix Wolfheimer Date: Fr., 20. Juli 2018, 23:11 Subject: SLURM Elastic Compute - Unable to determine this node's NodeName To: Hi, I'm trying to configure a cluster

[slurm-users] SLURM Elastic Compute - Unable to determine this node's NodeName

2018-07-20 Thread Felix Wolfheimer
Hi, I'm trying to configure a cluster on AWS which scales automatically using SLURM's Elastic Compute (https://slurm.schedmd.com/elastic_computing.html). However, I can't figure out how the nodes are supposed to be registered such that SLURM. I've a simple setup in my slurm.conf (shared by all no