I think we would need to see your SuspendScript to get a better idea of
what is happening.
That error indicates the nodes are likely not running slurmd and the
control daemon things they are still up.
What is the output of 'sinfo -R'?
Brian Andrus
On 1/7/2020 3:42 AM, Steve Brasier wrote:
Hi all,
I've got elastic compute working with slurm but on "suspend" I get
something like the following in the slurmcltd log:
power down request repeating for node compute-2
power down request repeating for node compute-3
error: Nodes compute-[2-3] not responding
The docs say that the SuspendScr