Re: [Beowulf] Poor bandwith from one compute node

Gus Correa Thu, 17 Aug 2017 11:41:18 -0700

On 08/17/2017 12:35 PM, Joe Landman wrote:

On 08/17/2017 12:00 PM, Faraz Hussain wrote:
I noticed an mpi job was taking 5X longer to run whenever it got thecompute node lusytp104 . So I ran qperf and found the bandwidthbetween it and any other nodes was ~100MB/sec. This is much lower than~1GB/sec between all the other nodes. Any tips on how to debugfurther? I haven't tried rebooting since it is currently running asingle-node job.
[hussaif1@lusytp114 ~]$ qperf lusytp104 tcp_lat tcp_bw
tcp_lat:
    latency  =  17.4 us
tcp_bw:
    bw  =  118 MB/sec
[hussaif1@lusytp114 ~]$ qperf lusytp113 tcp_lat tcp_bw
tcp_lat:
    latency  =  20.4 us
tcp_bw:
    bw  =  1.07 GB/sec
This is separate issue from my previous post about a slow computenode. I am still investigating that per the helpful replies. Will postan update about that once I find the root cause!
Sounds very much like it is running over gigabit ethernet vsInfiniband. Check to make sure it is using the right network ...


Hi Faraz

As others have said answering your previous posting about Infiniband:

- Check if the node is configured the same way as the other nodes,
in the case of Infinband, if the MTU is the same,
using connected or datagram mode, etc.

**

Besides, for Open MPI you can force it at runtime not to use tcp:
--mca btl ^tcp
or with the syntax in this FAQ:
https://www.open-mpi.org/faq/?category=openfabrics#ib-btl

If that node has an Infinband interface with a problem,
this should at least give a clue.

**

In addition, check the limits in the node.
That may be set by your resource manager,
or in /etc/security/limits.conf
or perhaps in the actual job script.
The memlock limit is key to Open MPI over Infiniband.
See FAQ 15, 16, 17 here:
https://www.open-mpi.org/faq/?category=openfabrics

**

Moreover, check if the mlx4_core.conf (assuming it is Mellanox HW)
is configured the same way across the nodes:

/etc/modprobe.d/mlx4_core.conf

See FAQ 18 here:
https://www.open-mpi.org/faq/?category=openfabrics

**

To increase the btl diagnostic verbosity (that goes to STDERR, IRRC):

--mca btl_base_verbose 30

That may point out which interfaces are actually being used, etc.

See this FAQ:

https://www.open-mpi.org/faq/?category=all#diagnose-multi-host-problems

**

Finally, as John has suggested before, you may want to
subscribe to the Open MPI mailing list,
and ask the question there as well:

https://www.open-mpi.org/community/help/
https://www.open-mpi.org/community/lists/

There you will get feedback from the Open MPI developers +
user community, and that often includes insights from
Intel and Mellanox IB hardware experts.

**

I hope this helps.

Gus Correa

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Poor bandwith from one compute node

Reply via email to