Thanks Joe. Here is the output from the commands you suggested. We
have open mpi built from Intel mpi compiler. Is there some benchmark
code I can compile so that we are all comparing the same code?
[hussaif1@lustwzb4 test]$ ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.11.550
node_guid: f452:1403:0016:3b70
sys_image_guid: f452:1403:0016:3b73
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x0
board_id: DEL0A40000028
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 3
port_lmc: 0x00
link_layer: InfiniBand
port: 2
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
[hussaif1@lustwzb4 test]$ ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.11.550
Hardware version: 0
Node GUID: 0xf452140300163b70
System image GUID: 0xf452140300163b73
Port 1:
State: Active
Physical state: LinkUp
Rate: 40 (FDR10)
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: 0xf452140300163b71
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Disabled
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02514868
Port GUID: 0xf452140300163b72
Link layer: InfiniBand
[hussaif1@lustwzb4 test]$ ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:f452:1403:0016:3b71
base lid: 0x3
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X FDR10)
link_layer: InfiniBand
Infiniband device 'mlx4_0' port 2 status:
default gid: fe80:0000:0000:0000:f452:1403:0016:3b72
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 3: Disabled
rate: 10 Gb/sec (4X)
link_layer: InfiniBand
Quoting Joe Landman <joe.land...@gmail.com>:
start with
ibv_devinfo
ibstat
ibstatus
and see what (if anything) they report.
Second, how did you compile/run your MPI code?
On 08/02/2017 12:44 PM, Faraz Hussain wrote:
I have inherited a 20-node cluster that supposedly has an
infiniband network. I am testing some mpi applications and am
seeing no performance improvement with multiple nodes. So I am
wondering if the Infiband network even works?
The output of ifconfig -a shows an ib0 and ib1 network. I ran
ethtools ib0 and it shows:
Speed: 40000Mb/s
Link detected: no
and for ib1 it show:
Speed: 10000Mb/s
Link detected: no
I am assuming this means it is down? Any idea how to debug further
and restart it?
Thanks!
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf