For this one, you want to look closely at the job. Is it targeting a
specific partition/nodelist?
See what resources it is looking for (scontrol show job <jobid>)
Also look at the partition limits as well as any QOS items (if you are
using them).
Brian Andrus
On 4/1/2021 10:00 AM, Sajesh Singh wrote:
Some additional information after enabling debug3 on slurmctld daemon:
Logs show that there are enough usable nodes for the job:
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-11
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-12
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-13
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-14
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-15
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-16
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-17
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-18
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-19
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-20
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-21
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-22
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-23
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-24
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-25
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-26
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-27
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-28
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-29
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-30
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-31
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-32
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-33
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-34
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-35
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-36
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-37
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-38
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-39
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-40
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-41
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-42
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-43
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-44
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-45
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-46
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-47
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-48
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-49
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-50
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-51
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-52
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-53
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-54
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-55
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-56
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-57
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-58
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-59
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-60
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-61
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-62
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-63
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-64
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-65
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-66
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-67
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-68
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-69
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-70
[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config
containing node-71
But then the following line is in the log as well:
debug3: select_nodes: JobId=67171529 required nodes not avail
--
-Sajesh-
*From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf
Of *Sajesh Singh
*Sent:* Thursday, March 25, 2021 9:02 AM
*To:* Slurm User Community List <slurm-users@lists.schedmd.com>
*Subject:* Re: [slurm-users] Limit on number of nodes user able to request
*EXTERNAL SENDER*
No nodes in downed or drained state. These are nodes that are
dynamically brought up and down via the powersave plugin. When the are
taken offline due to being idle I believe the state is set to FUTURE
by the powersave plugin.
-Sajesh-
*From:* slurm-users <slurm-users-boun...@lists.schedmd.com
<mailto:slurm-users-boun...@lists.schedmd.com>> *On Behalf Of *Brian
Andrus
*Sent:* Wednesday, March 24, 2021 11:02 PM
*To:* slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>
*Subject:* Re: [slurm-users] Limit on number of nodes user able to request
*EXTERNAL SENDER*
Do 'sinfo -R' and see if you have any down or drained nodes.
Brian Andrus
On 3/24/2021 6:31 PM, Sajesh Singh wrote:
Slurm 20.02
CentOS 8
I just recently noticed a strange behavior when using the
powersave plugin for bursting to AWS. I have a queue configured
with 60 nodes, but if I submit a job to use all of the nodes I get
the error:
(Nodes required for job are DOWN, DRAINED or reserved for jobs in
higher priority partitions
If I lower the job to request 50 nodes it gets submitted and runs
with no problems. I do not have and associations or QOS limits in
place that would limit the user. Any ideas as to what could be
causing this limit of 50 nodes to be imposed?
-Sajesh-