Re: [slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-30 Thread Felix Wolfheimer
After a bit more testing I can answer my original question: I was just too impatient. When the ResumeProgram comes back with an exit code != 0 SLURM doesn't taint the node, i.e., it tries to start it again after a while. Exactly what I want! :-) @Lachlan Musicman: My slurm.conf Node and Partition

Re: [slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-28 Thread Lachlan Musicman
On 29 July 2018 at 04:32, Felix Wolfheimer wrote: > I'm experimenting with SLURM Elastic Compute on a cloud platform. I'm > facing the following situation: Let's say, SLURM requests that a compute > instance is started. The ResumeProgram tries to create the instance, but > doesn't succeed because