Hello! Recently I've started using MPI on our HPC-cluster. It has 40 nodes. It runs SLURM. I'm new to MPI and SLURM but so far everything works fine except one thing. In short: nodes that finished calculation do not become idle. Only after all the nodes finished calculations they all become idle.
Here's an example of a typical node: $ scontrol show nodes cn-022 NodeName=cn-022 Arch=x86_64 CoresPerSocket=18 CPUAlloc=36 CPUTot=36 CPULoad=1.01 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=cn-022 NodeHostName=cn-022 Version=18.08 OS=Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 RealMemory=1 AllocMem=0 FreeMem=507942 Sockets=2 Boards=1 State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=normal,long,shared BootTime=2021-06-07T20:45:06 SlurmdStartTime=2021-06-07T20:43:27 CfgTRES=cpu=36,mem=1M,billing=36 AllocTRES=cpu=36,mem=1M,billing=36 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Here's my sbatch script: #!/bin/bash #SBATCH --job-name=robotune #SBATCH --nodes=36 #SBATCH --ntasks=36 #SBATCH --cpus-per-task=36 #SBATCH --time=5-12:00:00 #SBATCH --output="%x-%N-%j.out" module purge module load gnu8/8.3.0 module load mpich/3.3 srun --mpi=pmi2 /home/ptashko/work/robomarket/cmd/tune/robotune <ARGS> And here's the CPU load of all nodes allocated for this command: $ scontrol show nodes cn-[005-040] | egrep "CPULoad" CPUAlloc=36 CPUTot=36 CPULoad=26.53 CPUAlloc=36 CPUTot=36 CPULoad=18.67 CPUAlloc=36 CPUTot=36 CPULoad=4.63 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.00 CPUAlloc=36 CPUTot=36 CPULoad=1.02 CPUAlloc=36 CPUTot=36 CPULoad=0.98 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=0.99 CPUAlloc=36 CPUTot=36 CPULoad=1.02 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=0.99 CPUAlloc=36 CPUTot=36 CPULoad=0.99 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=0.99 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=0.99 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 CPUAlloc=36 CPUTot=36 CPULoad=1.01 And: $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST normal* up 11-00:00:0 37 alloc cn-[001,005-040] normal* up 11-00:00:0 3 idle cn-[002-004] long up 31-00:00:0 37 alloc cn-[001,005-040] long up 31-00:00:0 3 idle cn-[002-004] shared up infinite 26 alloc cn-[015-040] So as you see almost all nodes finished calculations (CPULoad 1%). Only three are working. But those who finished do not become idle! I want finished nodes to become idle. What I am possibly doing wrong? Thank you, Grigory.