On Tue, Oct 17, 2017 at 8:54 AM, Peter Kjellström <c...@nsc.liu.se> wrote: > If you have IntelMPI also try what I suggested and use the ucm dapl. > For example for the first port on an mlx4 hca that's "ofa-v2-mlx4_0-1u". > > You can make sure that it comes first in your dat.conf (/etc/rmda > or /etc/infiniband) or pass it explicitly to IntelMPI: > > I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u mpiexec.hydra ... > > You may want to set I_MPI_DEBUG=4 or so to see what it does.
i can confirm that the dapl test with intelmpi is pretty speedy. when i startup an mpi job without dapl enabled it takes ~60 seconds before the test actually starts, with dapl enabled it's only a few seconds. and the t_avg timings in imb alltoallv i'm running are vastly different. i think i can safely say at this point it's probably not hardware related, but something went wonky with openmpi. i downloaded the new version 3 that was released, i'll see if that fixes anything. i've been tracking reports on the openmpi list about issues between slurm and openmpi with relation to pmi, i'm not sure if it's related or not, but might be. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf