Hi,

we had the same issue and solved it by using the 'plane' distribution in combination with MPMD style srun, e.g. in your example


#SBATCH -N 3  # 3 nodes with 10 cores each
#SBATCH -n 21 # 21 MPI-tasks in sum
#SBATCH --cpus-per-task=1 # if you do not want hyperthreading

cat > mpmd.conf << EOF
0-19 ./slave_app
20 ./master_app
EOF

srun--distribution=plane=10 --multi-prog mpmd.conf


More examples are given here:
https://slurm.schedmd.com/dist_plane.html

Best regards,
Hendryk

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to