Shaun, Lemme know if you have an mvapich2 kit that I can test with iwarp...
Thanks, Steve. On Wed, 2007-02-14 at 23:31 -0500, Shaun Rowland wrote: > Roland Dreier wrote: > > > When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is > > > built, at least by looking at the .so file result: > > > > > > [EMAIL PROTECTED] ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs > > libibverbs.a > > > libibverbs.so > > > libibverbs.so.1 > > > libibverbs.so.1.0.0 > > > > The soname hasn't changed because the library is still compatible. > > But (I hope at least) OFED has libibverbs 1.1. > > The soname is libibverbs.so.1, so I guess the longer name would not > matter anyway. Clearly, what I posted shows the IBVERBS 1.1 ABI is > there. I think I have figured out why our code has this problem. The > problem below is similar to the original one posted about. > > I did some experimentation with the srq_pingpong libibverbs example > code. First I built it directly with: > > > gcc -g -c pingpong.c -I/usr/local/ofed/include > > gcc -g -c -D_GNU_SOURCE srq_pingpong.c -I/usr/local/ofed/include > > gcc -g -o srq_pingpong srq_pingpong.o pingpong.o -L/usr/local/ofed/lib64 > -libverbs > > > This works. Next I copied srq_pingpong.c to two files: > > srq_pingpong_rowland.c > - just has a main function that calls lib_start(). > > srq_pingpong_lib_rowland.c > - main() changed to lib_start(). > > This moves all of the SRQ pingpong code into a shared library. If I > build this shared library in this way, it works: > > > gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include > > gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c > -I/usr/local/ofed/include > > gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so > srq_pingpong_lib_rowland.o pingpong.o -L/usr/local/ofed/lib64 -libverbs > > gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD -lsrqtest > > > Above I am linking libibverbs directly into my libsrqtest.so > library. This works and the IBVERBS 1.1 ABI is clearly in the > libsrqtest.so file: > > [EMAIL PROTECTED] ibverbs-examples]$ nm libsrqtest.so |grep ibv |head > U ibv_ack_cq_events@@IBVERBS_1.1 > U ibv_alloc_pd@@IBVERBS_1.1 > U ibv_close_device@@IBVERBS_1.1 > U ibv_create_comp_channel@@IBVERBS_1.0 > U ibv_create_cq@@IBVERBS_1.1 > U ibv_create_qp@@IBVERBS_1.1 > U ibv_create_srq@@IBVERBS_1.1 > U ibv_dealloc_pd@@IBVERBS_1.1 > U ibv_dereg_mr@@IBVERBS_1.1 > U ibv_destroy_comp_channel@@IBVERBS_1.0 > > However, if I build in a similar way to MVAPICH2, the resulting program > fails: > > > gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include > > gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c > -I/usr/local/ofed/include > > gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so > srq_pingpong_lib_rowland.o pingpong.o > > gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD > -L/usr/local/ofed/lib64 -lsrqtest -libverbs > > > Above I am not linking libibverbs into libsrqtest.so, thus it is > required on the last gcc line. This is how MVAPICH2's libmpich.so file > works, and from past experience, I've seen this before. Running shows: > > [EMAIL PROTECTED] ibverbs-examples]$ gdb ./srq_pingpong_rowland > GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu"...Using host > libthread_db library "/lib64/tls/libthread_db.so.1". > > (gdb) r > Starting program: > /home/7/rowland/z1-test/ibverbs-examples/srq_pingpong_rowland > [Thread debugging using libthread_db enabled] > [New Thread 182896403968 (LWP 29858)] > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 182896403968 (LWP 29858)] > post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0, > bad_wr=0x7fbfff88c8) > at src/compat-1_0.c:312 > 312 src/compat-1_0.c: No such file or directory. > in src/compat-1_0.c > (gdb) bt > #0 post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0, > bad_wr=0x7fbfff88c8) at src/compat-1_0.c:312 > #1 0x0000002a95559e12 in ibv_post_srq_recv (srq=0x5075b0, > recv_wr=0x7fbfff88d0, bad_recv_wr=0x7fbfff88c8) > at /usr/local/ofed/include/infiniband/verbs.h:915 > #2 0x0000002a95559dcf in pp_post_recv (ctx=0x5023d0, n=500) > at srq_pingpong_lib_rowland.c:496 > #3 0x0000002a9555a614 in lib_start (argc=1, argv=0x7fbffff7f8) > at srq_pingpong_lib_rowland.c:696 > #4 0x0000000000400608 in main (argc=1, argv=0x7fbffff7f8) > at srq_pingpong_rowland.c:36 > (gdb) quit > > It is not clear to me why the difference of either linking libibverbs > into libsrqtest.so or not doing so causes the IBVERBS 1.1 ABI to be used > or not. I looked at the libibverbs code, and the 1.1 ABI is the default. > The libsrqtest.so file in the above case seems to have lost this > information: > > [EMAIL PROTECTED] ibverbs-examples]$ nm libsrqtest.so |grep ibv |head > U ibv_ack_cq_events > U ibv_alloc_pd > U ibv_close_device > U ibv_create_comp_channel > U ibv_create_cq > U ibv_create_qp > U ibv_create_srq > U ibv_dealloc_pd > U ibv_dereg_mr > U ibv_destroy_comp_channel > > I've never had to deal with an ABI issue like this in shared library > linking/usage. Does it make sense for this to be the case? I think > perhaps it does, but I wanted to ask. > > I've placed my test code here if it helps: > > http://www.cse.ohio-state.edu/~rowland/ibverbs-examples.tar.gz > > I have a fix for our code that I am testing now. It seems to work and > solve the observed problems, but more testing will be required to be > sure there are no issues. This will require a new SRPM if the fix is > required, which it seems at this point. _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
