On 08/17/2017 12:35 PM, Joe Landman wrote:


On 08/17/2017 12:00 PM, Faraz Hussain wrote:
I noticed an mpi job was taking 5X longer to run whenever it got the compute node lusytp104 . So I ran qperf and found the bandwidth between it and any other nodes was ~100MB/sec. This is much lower than ~1GB/sec between all the other nodes. Any tips on how to debug further? I haven't tried rebooting since it is currently running a single-node job.

[hussaif1@lusytp114 ~]$ qperf lusytp104 tcp_lat tcp_bw
tcp_lat:
    latency  =  17.4 us
tcp_bw:
    bw  =  118 MB/sec
[hussaif1@lusytp114 ~]$ qperf lusytp113 tcp_lat tcp_bw
tcp_lat:
    latency  =  20.4 us
tcp_bw:
    bw  =  1.07 GB/sec

This is separate issue from my previous post about a slow compute node. I am still investigating that per the helpful replies. Will post an update about that once I find the root cause!

Sounds very much like it is running over gigabit ethernet vs Infiniband. Check to make sure it is using the right network ...

Hi Faraz

As others have said answering your previous posting about Infiniband:

- Check if the node is configured the same way as the other nodes,
in the case of Infinband, if the MTU is the same,
using connected or datagram mode, etc.

**

Besides, for Open MPI you can force it at runtime not to use tcp:
--mca btl ^tcp
or with the syntax in this FAQ:
https://www.open-mpi.org/faq/?category=openfabrics#ib-btl

If that node has an Infinband interface with a problem,
this should at least give a clue.

**

In addition, check the limits in the node.
That may be set by your resource manager,
or in /etc/security/limits.conf
or perhaps in the actual job script.
The memlock limit is key to Open MPI over Infiniband.
See FAQ 15, 16, 17 here:
https://www.open-mpi.org/faq/?category=openfabrics

**

Moreover, check if the mlx4_core.conf (assuming it is Mellanox HW)
is configured the same way across the nodes:

/etc/modprobe.d/mlx4_core.conf

See FAQ 18 here:
https://www.open-mpi.org/faq/?category=openfabrics

**

To increase the btl diagnostic verbosity (that goes to STDERR, IRRC):

--mca btl_base_verbose 30

That may point out which interfaces are actually being used, etc.

See this FAQ:

https://www.open-mpi.org/faq/?category=all#diagnose-multi-host-problems

**

Finally, as John has suggested before, you may want to
subscribe to the Open MPI mailing list,
and ask the question there as well:

https://www.open-mpi.org/community/help/
https://www.open-mpi.org/community/lists/

There you will get feedback from the Open MPI developers +
user community, and that often includes insights from
Intel and Mellanox IB hardware experts.

**

I hope this helps.

Gus Correa



_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to