On 08/29/2014 11:30 AM, Michael Di Domenico wrote:
On Fri, Aug 29, 2014 at 9:32 AM, John Hearns <john.hea...@viglen.co.uk> wrote:
I would say the usual tool for that pair-wise comparison is Intel IBM
https://software.intel.com/en-us/articles/intel-mpi-benchmarks
I hope I have got your requirement correct!
John,

Close, but not exact.  IMB will test ranks, but will not tell me if a
specific pair of ranks is slower then others, only the collective of
the ranks under test.  what i'm looking for is an mpi version of this

for x in node1->node100
for y in node1->node100
if x==y then skip
else mpirun -n 2 -npernode 1 -host $x,$y bwtest > $x$y.log

unfortunately, the mpirun task takes about 3secs per iteration, and
with 10k iterations, it's going to take along time and i'm being
impatient.  i've been trying to write the mpi code myself, but my mpi
is a little rusty so it's slow going...

Also have you run  ibdiagnet to see if anything is flagged up?
i've run a multitude of ib diags on the machines, but nothing is
popping out as wrong.  what's weird is that it's only certain pairing
of machines not any one machine in general.


I find most of the ibdiag* utilities to be of limited value when debugging IB issues. Unfortunately, Mellanox's Unified Fabric Manager (UFM) seems to be the only tool that's helpful for accurately monitoring and identifying issues with IB networks. I've never used UFM myself, but my friends at Princeton gave me a demo, and it's seems like a fantastic tool.

Unfortunately, it's a commercial product, and probably only works on Mellanox hardware (you don't mention whether your using Qlogic or Mellanox hardware). The good news is, you can download it and evaluate it. I'd give that a try, if I were you.

http://www.mellanox.com/page/products_dyn?product_family=100

--
Prentice


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to