i'm seeing issues on a mellanox fdr10 cluster where the mpi setup and teardown takes longer then i expect it should on larger rank count jobs. i'm only trying to run ~1000 ranks and the startup time is over a minute. i tested this with both openmpi and intel mpi, both exhibit close to the same behavior.
has anyone else seen this or might know how to fix it? i expect ~1000 ranks to take sometime to setup, but it seems to be taking longer then i think it should _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf