I'm using the SLURM Elastic Compute feature and it works great in general. However, I noticed that there's a bit of inefficiency in the decision about the number of nodes which SLURM creates. Let's say I've the following configuration
NodeName=compute-[1-100] CPUs=10 State=CLOUD and there are none of these nodes up and running. Let's further say that I create 10 identical jobs and submit them at the same time using sbatch --nodes=1 --ntasks-per-node=1 I expected that SLURM finds out that 10 CPUs are required in total to serve the requirements for all jobs and, thus, creates a single compute node. However, SLURM triggers the creation of one node per job, i.e., 10 nodes are created. When the first of these ten nodes is ready to accept jobs, SLURM assigns all of the 10 submitted jobs to this single node, though. The other nine nodes which were created are running idle and are terminated again after a while. I'm using "SelectType=select/cons_res" to schedule on the CPU level. Is there some knob which influences this behavior or is this behavior hard-coded?