i posted a copy of this to openmpi mailing list, but i'm curious if anyone here can lend suggestions on troubleshooting
--- i'm getting stuck trying to run some fairly large IMB-MPI alltoall tests under openmpi 2.0.2 on rhel 7.4 i have two different clusters, one running mellanox fdr10 and one running qlogic qdr if i issue mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv the job just stalls after the "List of Benchmarks to run: Alltoallv" line outputs from IMB-MPI if i switch it to alltoall the test does progress often when running various size alltoall's i'll get "too many retries sending message to <>:<>, giving up i'm able to use infiniband just fine (our lustre filesystem mounts over it) and i have other mpi programs running it only seems to stem when i run alltoall type primitives any thoughts on debugging where the failures are, i might just need to turn up the debugging, but i'm not sure where _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf