Faraz, I didn't notice any tests where you actually tested the ip layer. You should run some iperf tests between nodes to make sure ipoib functions. Your infiniband/rdma can be working fine and ipoib can be dysfunctional. You need to ensure the ipoib configuration, like any ip environment, is configured the same on all nodes (network/subnet, netmask, mtu, etc) and that all of the nodes are configured for the same mode (connected vs datagram). If you can't run iperf then there is something broken in the ipoib configuration.
--Jeff On Thu, Aug 3, 2017 at 8:41 AM, Faraz Hussain <i...@feacluster.com> wrote: > Thanks for everyone's help. Using the Ohio State tests, qperf and > perfquery I am convinced the IB network is working. The only thing that > still bothers me is I can not get mpirun to use the tcp network. I tried > all combinations of --mca btl to no avail. It is not important, more just > curiosity. > > > > Quoting Michael Di Domenico <mdidomeni...@gmail.com>: > > On Thu, Aug 3, 2017 at 10:10 AM, Faraz Hussain <i...@feacluster.com> >> wrote: >> >>> Thanks, I installed the MPI tests from Ohio State. I ran osu_bw and got >>> the >>> results below. What is confusing is I get the same result if I use tcp or >>> openib ( by doing --mca btl openib|tcp,self with my mpirun command ). I >>> also >>> tried changing the environment variable: export OMPI_MCA_btl=tcp,self,sm >>> . >>> Results are the same regardless of tcp or openib.. >>> >>> And when I do ifconfig -a I still see zero traffic reported for the ib0 >>> and >>> ib1 network. >>> >> >> if openmpi uses RDMA for the traffic ib0/ib1 will not show traffic, >> you have to use perfquery >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- ------------------------------ Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf