Thanks Joe. Here is the output from the commands you suggested. We have open mpi built from Intel mpi compiler. Is there some benchmark code I can compile so that we are all comparing the same code?

[hussaif1@lustwzb4 test]$ ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.11.550
        node_guid:                      f452:1403:0016:3b70
        sys_image_guid:                 f452:1403:0016:3b73
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x0
        board_id:                       DEL0A40000028
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               3
                        port_lmc:               0x00
                        link_layer:             InfiniBand

                port:   2
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             InfiniBand

[hussaif1@lustwzb4 test]$ ibstat
CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 2
        Firmware version: 2.11.550
        Hardware version: 0
        Node GUID: 0xf452140300163b70
        System image GUID: 0xf452140300163b73
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40 (FDR10)
                Base lid: 3
                LMC: 0
                SM lid: 1
                Capability mask: 0x02514868
                Port GUID: 0xf452140300163b71
                Link layer: InfiniBand
        Port 2:
                State: Down
                Physical state: Disabled
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02514868
                Port GUID: 0xf452140300163b72
                Link layer: InfiniBand

[hussaif1@lustwzb4 test]$ ibstatus
Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:f452:1403:0016:3b71
        base lid:        0x3
        sm lid:          0x1
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            40 Gb/sec (4X FDR10)
        link_layer:      InfiniBand

Infiniband device 'mlx4_0' port 2 status:
        default gid:     fe80:0000:0000:0000:f452:1403:0016:3b72
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      3: Disabled
        rate:            10 Gb/sec (4X)
        link_layer:      InfiniBand


Quoting Joe Landman <joe.land...@gmail.com>:

start with

    ibv_devinfo

    ibstat

    ibstatus


and see what (if anything) they report.

Second, how did you compile/run your MPI code?


On 08/02/2017 12:44 PM, Faraz Hussain wrote:
I have inherited a 20-node cluster that supposedly has an infiniband network. I am testing some mpi applications and am seeing no performance improvement with multiple nodes. So I am wondering if the Infiband network even works?

The output of ifconfig -a shows an ib0 and ib1 network. I ran ethtools ib0 and it shows:

       Speed: 40000Mb/s
       Link detected: no

and for ib1 it show:

       Speed: 10000Mb/s
       Link detected: no

I am assuming this means it is down? Any idea how to debug further and restart it?

Thanks!

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to