On Tue, Oct 17, 2017 at 8:54 AM, Peter Kjellström <c...@nsc.liu.se> wrote:
> If you have IntelMPI also try what I suggested and use the ucm dapl.
> For example for the first port on an mlx4 hca that's "ofa-v2-mlx4_0-1u".
>
> You can make sure that it comes first in your dat.conf (/etc/rmda
> or /etc/infiniband) or pass it explicitly to IntelMPI:
>
> I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u mpiexec.hydra ...
>
> You may want to set I_MPI_DEBUG=4 or so to see what it does.

i can confirm that the dapl test with intelmpi is pretty speedy.

when i startup an mpi job without dapl enabled it takes ~60 seconds
before the test actually starts, with dapl enabled it's only a few
seconds.  and the t_avg timings in imb alltoallv i'm running are
vastly different.

i think i can safely say at this point it's probably not hardware
related, but something went wonky with openmpi.  i downloaded the new
version 3 that was released, i'll see if that fixes anything.  i've
been tracking reports on the openmpi list about issues between slurm
and openmpi with relation to pmi, i'm not sure if it's related or not,
but might be.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to