Re: [slurm-users] sbatch overallocation

mercan Sat, 10 Oct 2020 06:05:31 -0700

Hi;

You can submit each pimplefoam as a seperate job. or if you realy submitas a single job, you can use a program to run each of them as much ascpu count such as gnu parallel:


https://www.gnu.org/software/parallel/

regards;

Ahmet M.


10.10.2020 14:05 tarihinde Max Quast yazdı:

Dear slurm-users,
I built a slurm system consisting of two nodes (Ubuntu 20.04.1, slurm20.02.5):
                # COMPUTE NODES

GresTypes=gpu
NodeName=lsm[216-217] Gres=gpu:tesla:1 CPUs=64 RealMemory=192073Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN
PartitionName=admin Nodes=lsm[216-217] Default=YES MaxTime=INFINITEState=UP
The slurmctl is running on a separate Ubuntu system where no slurmd isinstalled.
If a user executes this script (sbatch srun2.bash)

#!/bin/bash

                #SBATCH -N 2 -n9
srun pimpleFoam -case/mnt/NFS/users/quast/channel395-10 -parallel > /dev/null &
srun pimpleFoam -case/mnt/NFS/users/quast/channel395-11 -parallel > /dev/null &
srun pimpleFoam -case/mnt/NFS/users/quast/channel395-12 -parallel > /dev/null &
srun pimpleFoam -case/mnt/NFS/users/quast/channel395-13 -parallel > /dev/null &
srun pimpleFoam -case/mnt/NFS/users/quast/channel395-14 -parallel > /dev/null &
srun pimpleFoam -case/mnt/NFS/users/quast/channel395-15 -parallel > /dev/null &
srun pimpleFoam -case/mnt/NFS/users/quast/channel395-16 -parallel > /dev/null &
srun pimpleFoam -case/mnt/NFS/users/quast/channel395-17 -parallel > /dev/null &
                wait

8 jobs with 9 threads are launched and distributed on two nodes.
If more such scripts get started at the same time, all the sruncommands will be executed even though no free cores are available. Sothe nodes are overallocated.
How can this be prevented?

Thx :)

Greetings

max

Re: [slurm-users] sbatch overallocation

Reply via email to