You may want to look at your resources. If the memory allocation adds up
such that there isn't enough left for any job to run, it won't matter
that there are still GPUs available.
Similar for any other resource (CPUs, cores, etc)
Brian Andrus
On 8/10/2021 8:07 AM, Jack Chen wrote:
Does anyone have any ideas on this?
On Fri, Aug 6, 2021 at 2:52 PM Jack Chen <scs...@gmail.com
<mailto:scs...@gmail.com>> wrote:
I'm using slurm15.08.11, when I submit several 1 gpu jobs, slurm
doesn't allocate nodes using compact strategy. Anyone know how to
solve this? Will upgrading slurm latest version helpĀ ?
For example, there are two nodes A and B with 8 gpus per node, I
submitted 8 1 gpu jobs, slurm will allocate first 6 jobs on node
A, then last 2 jobs on node B. Then when I submit one job with 8
gpus, it will pending because of gpu fragments: nodes A has 2 idle
gpus, node b 6 idle gpus
Thanks in advance!