On 08/17/2017 02:02 PM, Scott Atchley wrote:
I would agree that the bandwidth points at 1 GigE in this case.

For IB/OPA cards running slower than expected, I would recommend ensuring that they are using the correct amount of PCIe lanes.

Turns out, there is a really nice open source tool that does this for you ...

https://github.com/joelandman/pcilist

:D


On Thu, Aug 17, 2017 at 12:35 PM, Joe Landman <joe.land...@gmail.com <mailto:joe.land...@gmail.com>> wrote:



    On 08/17/2017 12:00 PM, Faraz Hussain wrote:

        I noticed an mpi job was taking 5X longer to run whenever it
        got the compute node lusytp104 . So I ran qperf and found the
        bandwidth between it and any other nodes was ~100MB/sec. This
        is much lower than ~1GB/sec between all the other nodes. Any
        tips on how to debug further? I haven't tried rebooting since
        it is currently running a single-node job.

        [hussaif1@lusytp114 ~]$ qperf lusytp104 tcp_lat tcp_bw
        tcp_lat:
            latency  =  17.4 us
        tcp_bw:
            bw  =  118 MB/sec
        [hussaif1@lusytp114 ~]$ qperf lusytp113 tcp_lat tcp_bw
        tcp_lat:
            latency  =  20.4 us
        tcp_bw:
            bw  =  1.07 GB/sec

        This is separate issue from my previous post about a slow
        compute node. I am still investigating that per the helpful
        replies. Will post an update about that once I find the root
        cause!


    Sounds very much like it is running over gigabit ethernet vs
    Infiniband.  Check to make sure it is using the right network ...


        _______________________________________________
        Beowulf mailing list, Beowulf@beowulf.org
        <mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
        To change your subscription (digest mode or unsubscribe) visit
        http://www.beowulf.org/mailman/listinfo/beowulf
        <http://www.beowulf.org/mailman/listinfo/beowulf>



--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to