Re: [slurm-users] Limit on number of nodes user able to request

2021-03-24 Thread Brian Andrus
Do 'sinfo -R' and see if you have any down or drained nodes. Brian Andrus On 3/24/2021 6:31 PM, Sajesh Singh wrote: Slurm 20.02 CentOS 8 I just recently noticed a strange behavior when using the powersave plugin for bursting to AWS. I have a queue configured with 60 nodes, but if I submit

[slurm-users] Limit on number of nodes user able to request

2021-03-24 Thread Sajesh Singh
Slurm 20.02 CentOS 8 I just recently noticed a strange behavior when using the powersave plugin for bursting to AWS. I have a queue configured with 60 nodes, but if I submit a job to use all of the nodes I get the error: (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher

Re: [slurm-users] Suspended and released job continues running in a "down" partition

2021-03-24 Thread Brian Andrus
Suspend is really nothing more than hitting ^S on the job, so there is no interaction between it and the partition once it gets running. What behavior would you expect? Suspend is not cancel, which would need to be done to get the job out of that partition (even if it were checkpoint, then can

[slurm-users] Suspended and released job continues running in a "down" partition

2021-03-24 Thread Gestió Servidors
Hi, I have got this new question for you: In my cluster there is a running job. Then, I change a partition state from "up" to "down". Then, that job continues "running" because it was already running before the state had changed. Now, I run explicitly a "scontrol suspend my_job". After it, my