Re: [Beowulf] mpi slow pairs

Prentice Bisbal Tue, 02 Sep 2014 08:46:35 -0700


On 08/29/2014 11:30 AM, Michael Di Domenico wrote:

On Fri, Aug 29, 2014 at 9:32 AM, John Hearns <john.hea...@viglen.co.uk> wrote:

I would say the usual tool for that pair-wise comparison is Intel IBM
https://software.intel.com/en-us/articles/intel-mpi-benchmarks
I hope I have got your requirement correct!

John,


Close, but not exact.  IMB will test ranks, but will not tell me if a
specific pair of ranks is slower then others, only the collective of
the ranks under test.  what i'm looking for is an mpi version of this

for x in node1->node100
for y in node1->node100
if x==y then skip
else mpirun -n 2 -npernode 1 -host $x,$y bwtest > $x$y.log

unfortunately, the mpirun task takes about 3secs per iteration, and
with 10k iterations, it's going to take along time and i'm being
impatient.  i've been trying to write the mpi code myself, but my mpi
is a little rusty so it's slow going...

Also have you run  ibdiagnet to see if anything is flagged up?

i've run a multitude of ib diags on the machines, but nothing is
popping out as wrong.  what's weird is that it's only certain pairing
of machines not any one machine in general.

I find most of the ibdiag* utilities to be of limited value whendebugging IB issues. Unfortunately, Mellanox's Unified Fabric Manager(UFM) seems to be the only tool that's helpful for accurately monitoringand identifying issues with IB networks. I've never used UFM myself, butmy friends at Princeton gave me a demo, and it's seems like a fantastictool.

Unfortunately, it's a commercial product, and probably only works onMellanox hardware (you don't mention whether your using Qlogic orMellanox hardware). The good news is, you can download it and evaluateit. I'd give that a try, if I were you.


http://www.mellanox.com/page/products_dyn?product_family=100

--
Prentice


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] mpi slow pairs

Reply via email to