async_reply_handle_thread_unsafe() can run while holding
pending_requests.lock and currently calls rte_eal_alarm_cancel().

rte_eal_alarm_cancel() may spin-wait for an executing callback, which can
deadlock if that callback is blocked on the same lock.

Remove callback-side alarm cancellation. It is safe to do so, because any
callback triggered without a pending request becomes a noop.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: [email protected]

Signed-off-by: Anatoly Burakov <[email protected]>
---
 lib/eal/common/eal_common_proc.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index d1a041b707..830c11f4ac 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -549,19 +549,6 @@ async_reply_handle_thread_unsafe(struct pending_request 
*req)
 
        TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-       if (rte_eal_alarm_cancel(async_reply_handle,
-                       (void *)(uintptr_t)req->id) < 0) {
-               /* if we failed to cancel the alarm because it's already in
-                * progress, don't proceed because otherwise we will end up
-                * handling the same message twice.
-                */
-               if (rte_errno == EINPROGRESS) {
-                       EAL_LOG(DEBUG, "Request handling is already in 
progress");
-                       goto no_trigger;
-               }
-               EAL_LOG(ERR, "Failed to cancel alarm");
-       }
-
        if (action == ACTION_TRIGGER)
                return req;
 no_trigger:
@@ -910,8 +897,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
                return -1;
        }
 
-       /* Set alarm before allocating or sending so request timeout tracking
-        * is active as soon as this request ID is reserved.
+       /* Set alarm before allocating or sending. The alarm is never cancelled:
+        * rte_eal_alarm_cancel spin-waits for an executing callback to finish,
+        * which deadlocks if we hold pending_requests.lock while the callback
+        * is blocked on it. Instead, let stale alarms fire; with ID-based
+        * lookup the callback will simply not find the request and return
+        * harmlessly.
         */
        id = ++next_request_id;
        if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
-- 
2.47.3

Reply via email to