Do 'sinfo -R' and see if you have any down or drained nodes.
Brian Andrus
On 3/24/2021 6:31 PM, Sajesh Singh wrote:
Slurm 20.02
CentOS 8
I just recently noticed a strange behavior when using the powersave
plugin for bursting to AWS. I have a queue configured with 60 nodes,
but if I submit
Slurm 20.02
CentOS 8
I just recently noticed a strange behavior when using the powersave plugin for
bursting to AWS. I have a queue configured with 60 nodes, but if I submit a job
to use all of the nodes I get the error:
(Nodes required for job are DOWN, DRAINED or reserved for jobs in higher
Suspend is really nothing more than hitting ^S on the job, so there is
no interaction between it and the partition once it gets running.
What behavior would you expect? Suspend is not cancel, which would need
to be done to get the job out of that partition (even if it were
checkpoint, then can
Hi,
I have got this new question for you:
In my cluster there is a running job. Then, I change a partition state from
"up" to "down". Then, that job continues "running" because it was already
running before the state had changed. Now, I run explicitly a "scontrol suspend
my_job". After it, my