I would suggest using Gnu Parallel (https://www.gnu.org/software/parallel/). Also, if you run that many "srun" in a row, on a very large cluster where the slurmctl is very solicited some of the srun might time out and not run.
Richard Le ven. 5 nov. 2021 à 05:45, Marcus Pedersén <marcus.peder...@slu.se> a écrit : > Hi all, > I have setup a basic slurm system and been testing out > a nuber of things. > The latest thing I started to test is the parallel parts. > What I have is about 70 independent scripts that would be > ideal to run in parallel. > For testing purposes I have created 20 dummy scripts > that print script name, hostname sleeps for one minute > and prints no of minutes. > > The way I want to run this is to allocate 2 nodes > and run all of the 20 scripts in parallel, each one of them > in one process. > My idea is that the first node will be filled up with 12 processes, > each process running one script and the second node will run > the rest of the processes/scripts (8 scripts on 8 processes). > I have read up on a couple of tutorials and looked at the documentation > for different parts of slurm. > But what ever flags I use for both sbatch and srun I do not seem to > be able to accomplish what I want. > All nodes have 6 cores with 2 threads. > > The closest I have come is with this small sbatch: > > #! /bin/bash > #SBATCH --job-name=TestParallel > #SBATCH --nodes=2 > #SBATCH --ntasks-per-node=1 > #SBATCH --ntasks=2 > #SBATCH --cpus-per-task=12 > #SBATCH --nodelist=node1,node2 > #SBATCH --output="%x-%4j-%N.out" > #SBATCH --mail-user=my@mail > #SBATCH --mail-type=ALL > > echo > date +%Y-%m-%d" "%H-%M-%S > > for i in {1..20} > do > srun --nodes=1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=1 > --exclusive --job-name=Testp-$i --output=/path/to/test_prog$i.log > /path/to/test_prog$i.sh & > done > > date +%Y-%m-%d" "%H-%M-%S > > wait > > > sacct gives the following output: > 505 TestParal+ all marcus 24 RUNNING > node[1-2] 0:0 > 505.batch batch 12 RUNNING > node1 0:0 > 505.0 Testp-3 1 RUNNING > node1 0:0 > 505.1 Testp-6 1 RUNNING > node2 0:0 > 505.2 Testp-2 1 RUNNING > node1 0:0 > 505.3 Testp-13 1 RUNNING > node1 0:0 > 505.4 Testp-9 1 RUNNING > node1 0:0 > 505.5 Testp-11 1 RUNNING > node1 0:0 > 505.6 Testp-16 1 RUNNING > node1 0:0 > 505.7 Testp-12 1 RUNNING > node1 0:0 > 505.8 Testp-20 1 RUNNING > node1 0:0 > 505.9 Testp-4 1 RUNNING > node1 0:0 > 505.10 Testp-19 1 RUNNING > node1 0:0 > 505.11 Testp-10 1 RUNNING > node1 0:0 > 505.12 Testp-5 1 RUNNING > node1 0:0 > > > Slurm only use one process on node2 and of cause I want all the last 8 > processes to run on node2. > > I have tried a number of other options usualy ending in running the same > script multiple times > and that is not what I want. > > I feel a bit stuck and can not get my head around this. > > I would really appreciate some help!! > > Many thanks in advance!! > > Best Regards > Marcus > > --- > När du skickar e-post till SLU så innebär detta att SLU behandlar dina > personuppgifter. För att läsa mer om hur detta går till, klicka här < > https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > E-mailing SLU will result in SLU processing your personal data. For more > information on how this is done, click here < > https://www.slu.se/en/about-slu/contact-slu/personal-data/> > >