On 10/26/20 7:25 AM, Jason Gunthorpe wrote:
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the handler triggers a completion and another thread does rdma_connect() or the handler directly calls rdma_connect(). In all cases rdma_connect() needs to hold the handler_mutex, but when handler's are invoked this is already held by the core code. This causes ULPs using the 2nd method to deadlock. Provide a rdma_connect_locked() and have all ULPs call it from their handlers. Reported-by: Guoqing Jiang <[email protected]> Fixes: 2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state" Signed-off-by: Jason Gunthorpe <[email protected]> ---
[....]
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c index 06603dd1c8aa38..b36b60668b1da9 100644 --- a/net/rds/ib_cm.c +++ b/net/rds/ib_cm.c @@ -956,9 +956,10 @@ int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id, bool isv6) rds_ib_cm_fill_conn_param(conn, &conn_param, &dp, conn->c_proposed_version, UINT_MAX, UINT_MAX, isv6); - ret = rdma_connect(cm_id, &conn_param); + ret = rdma_connect_locked(cm_id, &conn_param); if (ret) - rds_ib_conn_error(conn, "rdma_connect failed (%d)\n", ret); + rds_ib_conn_error(conn, "rdma_connect_locked failed (%d)\n", + ret);out:/* Beware - returning non-zero tells the rdma_cm to destroy
For RDS part, Acked-by: Santosh Shilimkar <[email protected]>
