For this one, you want to look closely at the job. Is it targeting a specific partition/nodelist?

See what resources it is looking for (scontrol show job <jobid>)
Also look at the partition limits as well as any QOS items (if you are using them).

Brian Andrus

On 4/1/2021 10:00 AM, Sajesh Singh wrote:

Some additional information after enabling debug3 on slurmctld daemon:

Logs show that there are enough usable nodes for the job:

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-11

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-12

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-13

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-14

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-15

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-16

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-17

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-18

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-19

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-20

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-21

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-22

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-23

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-24

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-25

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-26

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-27

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-28

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-29

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-30

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-31

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-32

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-33

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-34

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-35

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-36

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-37

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-38

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-39

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-40

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-41

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-42

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-43

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-44

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-45

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-46

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-47

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-48

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-49

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-50

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-51

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-52

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-53

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-54

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-55

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-56

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-57

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-58

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-59

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-60

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-61

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-62

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-63

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-64

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-65

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-66

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-67

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-68

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-69

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-70

[2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config containing node-71

But then the following line is in the log as well:

debug3: select_nodes: JobId=67171529 required nodes not avail

--

-Sajesh-

*From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf Of *Sajesh Singh
*Sent:* Thursday, March 25, 2021 9:02 AM
*To:* Slurm User Community List <slurm-users@lists.schedmd.com>
*Subject:* Re: [slurm-users] Limit on number of nodes user able to request

*EXTERNAL SENDER*

No nodes in downed or drained state. These are nodes that are dynamically brought up and down via the powersave plugin. When the are taken offline due to being idle I believe the state is set to FUTURE by the powersave plugin.

-Sajesh-

*From:* slurm-users <slurm-users-boun...@lists.schedmd.com <mailto:slurm-users-boun...@lists.schedmd.com>> *On Behalf Of *Brian Andrus
*Sent:* Wednesday, March 24, 2021 11:02 PM
*To:* slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>
*Subject:* Re: [slurm-users] Limit on number of nodes user able to request

*EXTERNAL SENDER*

Do 'sinfo -R' and see if you have any down or drained nodes.

Brian Andrus

On 3/24/2021 6:31 PM, Sajesh Singh wrote:

    Slurm 20.02

    CentOS 8

    I just recently noticed a strange behavior when using the
    powersave plugin for bursting to AWS. I have a queue configured
    with 60 nodes, but if I submit a job to use all of the nodes I get
    the error:

    (Nodes required for job are DOWN, DRAINED or reserved for jobs in
    higher priority partitions

    If I lower the job to request 50 nodes it gets submitted and runs
    with no problems. I do not have and associations or QOS limits in
    place that would limit the user. Any ideas as to what could be
    causing this limit of 50 nodes to be imposed?

    -Sajesh-

Reply via email to