Hi Roland, I am running a proprietary test over ofed1.1 (userspace).
I have one context where I poll my cq and another (signal handler context) where I try to destroy my QP. It looks like mthca_destroy_qp is trying to take a lock that mthca_poll_cq is holding. The deadlock is occurring at the end of the test run where there are no more completions, hence deadlocking and the test never exists. Here is a core dump: #0 0x0000003a6ce09172 in pthread_spin_lock () from /lib64/tls/libpthread.so.0 #1 0x0000002a959cf449 in mthca_cq_clean (cq=0x607240, qpn=3277830, srq=0x0) at src/cq.c:554 #2 0x0000002a959d28b9 in mthca_destroy_qp (qp=0x607400) at src/mthca.h:246 #3 0x000000000040117b in client_sig_handler () #4 <signal handler called> #5 0x0000003a6ce09165 in pthread_spin_lock () from /lib64/tls/libpthread.so.0 #6 0x0000002a959cec91 in mthca_poll_cq (ibcq=0x607240, ne=1, wc=0x7fbffff590) at src/cq.c:467 #7 0x0000002a9557bf73 in ibv_poll_cq (cq=0x607240, num_entries=1, wc=0x7fbffff590) at /usr/local/ofed/include/infiniband/verbs.h:824 Does destroy_qp needs to be dependent on the CQ? Do you have any suggestions? Thanks, Guy _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
