Re: [slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-30 Thread Felix Wolfheimer
After a bit more testing I can answer my original question: I was just too impatient. When the ResumeProgram comes back with an exit code != 0 SLURM doesn't taint the node, i.e., it tries to start it again after a while. Exactly what I want! :-) @Lachlan Musicman: My slurm.conf Node and Partition

Re: [slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-28 Thread Lachlan Musicman
On 29 July 2018 at 04:32, Felix Wolfheimer wrote: > I'm experimenting with SLURM Elastic Compute on a cloud platform. I'm > facing the following situation: Let's say, SLURM requests that a compute > instance is started. The ResumeProgram tries to create the instance, but > doesn't succeed because

[slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-28 Thread Felix Wolfheimer
I'm experimenting with SLURM Elastic Compute on a cloud platform. I'm facing the following situation: Let's say, SLURM requests that a compute instance is started. The ResumeProgram tries to create the instance, but doesn't succeed because the cloud provider can't provide the instance type at this