It may be using IPoIB (TCP/IP over IB), not verbs/rdma. You can force it to use openib (verbs, rdma) with (vader is for in-node shared memory):
mpirun --mca btl openib,self,vader ... These flags may also help tell which btl (byte transport layer) is being used: --mca btl_base_verbose 30 See these FAQ: https://www.open-mpi.org/faq/?category=openfabrics#ib-btl https://www.open-mpi.org/faq/?category=all#tcp-routability-1.3 Better really ask more details in the Open MPI list. They are the pros! My two cents, Gus Correa On Tue, Apr 30, 2019 at 3:57 PM Faraz Hussain <i...@feacluster.com> wrote: > Thanks, after buidling openmpi 4 from source, it now works! However it > still gives this message below when I run openmpi with verbose setting: > > No OpenFabrics connection schemes reported that they were able to be > used on a specific port. As such, the openib BTL (OpenFabrics > support) will be disabled for this port. > > Local host: lustwzb34 > Local device: mlx4_0 > Local port: 1 > CPCs attempted: rdmacm, udcm > > However, the results from my latency and bandwith tests seem to be > what I would expect from infiniband. See: > > [hussaif1@lustwzb34 pt2pt]$ mpirun -v -np 2 -hostfile ./hostfile > ./osu_latency > # OSU MPI Latency Test v5.3.2 > # Size Latency (us) > 0 1.87 > 1 1.88 > 2 1.93 > 4 1.92 > 8 1.93 > 16 1.95 > 32 1.93 > 64 2.08 > 128 2.61 > 256 2.72 > 512 2.93 > 1024 3.33 > 2048 3.81 > 4096 4.71 > 8192 6.68 > 16384 8.38 > 32768 12.13 > 65536 19.74 > 131072 35.08 > 262144 64.67 > 524288 122.11 > 1048576 236.69 > 2097152 465.97 > 4194304 926.31 > > [hussaif1@lustwzb34 pt2pt]$ mpirun -v -np 2 -hostfile ./hostfile ./osu_bw > # OSU MPI Bandwidth Test v5.3.2 > # Size Bandwidth (MB/s) > 1 3.09 > 2 6.35 > 4 12.77 > 8 26.01 > 16 51.31 > 32 103.08 > 64 197.89 > 128 362.00 > 256 676.28 > 512 1096.26 > 1024 1819.25 > 2048 2551.41 > 4096 3886.63 > 8192 3983.17 > 16384 4362.30 > 32768 4457.09 > 65536 4502.41 > 131072 4512.64 > 262144 4531.48 > 524288 4537.42 > 1048576 4510.69 > 2097152 4546.64 > 4194304 4565.12 > > When I run ibv_devinfo I get: > > [hussaif1@lustwzb34 pt2pt]$ ibv_devinfo > hca_id: mlx4_0 > transport: InfiniBand (0) > fw_ver: 2.36.5000 > node_guid: 480f:cfff:fff5:c6c0 > sys_image_guid: 480f:cfff:fff5:c6c3 > vendor_id: 0x02c9 > vendor_part_id: 4103 > hw_ver: 0x0 > board_id: HP_1360110017 > phys_port_cnt: 2 > Device ports: > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 4096 (5) > active_mtu: 1024 (3) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > link_layer: Ethernet > > port: 2 > state: PORT_DOWN (1) > max_mtu: 4096 (5) > active_mtu: 1024 (3) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > link_layer: Ethernet > > I will ask the openmpi mailing list if my results make sense?! > > > Quoting Gus Correa <g...@ldeo.columbia.edu>: > > > Hi Faraz > > > > By all means, download the Open MPI tarball and build from source. > > Otherwise there won't be support for IB (the CentOS Open MPI packages > most > > likely rely only on TCP/IP). > > > > Read their README file (it comes in the tarball), and take a careful look > > at their (excellent) FAQ: > > https://www.open-mpi.org/faq/ > > Many issues can be solved by just reading these two resources. > > > > If you hit more trouble, subscribe to the Open MPI mailing list, and ask > > questions there, > > because you will get advice directly from the Open MPI developers, and > the > > fix will come easy. > > https://www.open-mpi.org/community/lists/ompi.php > > > > My two cents, > > Gus Correa > > > > On Tue, Apr 30, 2019 at 3:07 PM Faraz Hussain <i...@feacluster.com> > wrote: > > > >> Thanks, yes I have installed those libraries. See below. Initially I > >> installed the libraries via yum. But then I tried installing the rpms > >> directly from Mellanox website ( > >> MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tar ). Even after doing > >> that, I still got the same error with openmpi. I will try your > >> suggestion of building openmpi from source next! > >> > >> root@lustwzb34:/root # yum list | grep ibverbs > >> libibverbs.x86_64 41mlnx1-OFED.4.5.0.1.0.45101 > >> libibverbs-devel.x86_64 41mlnx1-OFED.4.5.0.1.0.45101 > >> libibverbs-devel-static.x86_64 41mlnx1-OFED.4.5.0.1.0.45101 > >> libibverbs-utils.x86_64 41mlnx1-OFED.4.5.0.1.0.45101 > >> libibverbs.i686 17.2-3.el7 > >> rhel-7-server-rpms > >> libibverbs-devel.i686 1.2.1-1.el7 > >> rhel-7-server-rpms > >> > >> root@lustwzb34:/root # lsmod | grep ib > >> ib_ucm 22602 0 > >> ib_ipoib 168425 0 > >> ib_cm 53141 3 rdma_cm,ib_ucm,ib_ipoib > >> ib_umad 22093 0 > >> mlx5_ib 339961 0 > >> ib_uverbs 121821 3 mlx5_ib,ib_ucm,rdma_ucm > >> mlx5_core 919178 2 mlx5_ib,mlx5_fpga_tools > >> mlx4_ib 211747 0 > >> ib_core 294554 10 > >> > >> > rdma_cm,ib_cm,iw_cm,mlx4_ib,mlx5_ib,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib > >> mlx4_core 360598 2 mlx4_en,mlx4_ib > >> mlx_compat 29012 15 > >> > >> > rdma_cm,ib_cm,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,mlx5_fpga_tools,ib_ucm,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib > >> devlink 42368 4 mlx4_en,mlx4_ib,mlx4_core,mlx5_core > >> libcrc32c 12644 3 xfs,nf_nat,nf_conntrack > >> root@lustwzb34:/root # > >> > >> > >> > >> > Did you install libibverbs (and libibverbs-utils, for information and > >> > troubleshooting)? > >> > >> > yum list |grep ibverbs > >> > >> > Are you loading the ib modules? > >> > >> > lsmod |grep ib > >> > >> > > > >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf