On Thu, 21 May 2026 23:59:41 -0700 Wei Hu <[email protected]> wrote:
> From: Wei Hu <[email protected]> > > Add support for handling hardware service reset events in the > MANA driver. When the MANA kernel driver receives a hardware > service event, it initiates a device reset and notifies userspace > via IBV_EVENT_DEVICE_FATAL. The MANA PMD handles this by > performing an automatic teardown and recovery sequence. > > The driver uses ethdev recovery events (ERR_RECOVERING, > RECOVERY_SUCCESS, RECOVERY_FAILED) to notify upper layers of > the reset lifecycle, and a PCI device removal event callback > to distinguish hot-remove from service reset. > > Changes since v2: > - Fixed dev_state_qsv memory leak on device removal > - Fixed reset thread TCB/stack leak: reset_thread_active is now > only cleared by the joiner, not the thread itself > - Fixed second reset crash: removed reset thread join logic from > mana_dev_close (inner function) to avoid corrupting dev_state > when called from mana_reset_enter > - Made reset_thread_active RTE_ATOMIC(bool) with explicit ordering > - Added retry loop for rte_dev_event_callback_unregister on -EAGAIN > - Initialized condvar/mutex with PTHREAD_PROCESS_SHARED since priv > is in hugepage shared memory > - Added re-check of dev_state after lock acquisition in > mana_intr_handler to prevent racing with pci_remove_event_cb > - Replaced (void *)0 with NULL in mp.c > - Added lock ownership comment block at mana_reset_enter > - Documented rte_dev_event_monitor_start() requirement > - Added mana.rst documentation and release note (patch 2/2) > > Changes since v1: > - Removed net/netvsc patch from this series > - Simplified reset exit: mana_reset_exit calls > mana_reset_exit_delay directly instead of spawning a thread > - Added __rte_no_thread_safety_analysis annotations for clang > - Switched to rte_thread_create_internal_control > - Fixed declaration-after-statement style issues > - Removed unnecessary blank lines and stale comments > > Wei Hu (2): > net/mana: add device reset support > net/mana: add documentation for device reset support > > doc/guides/nics/mana.rst | 33 + > doc/guides/rel_notes/release_26_07.rst | 8 + > drivers/net/mana/mana.c | 995 ++++++++++++++++++++++--- > drivers/net/mana/mana.h | 33 +- > drivers/net/mana/meson.build | 2 +- > drivers/net/mana/mp.c | 89 ++- > drivers/net/mana/mr.c | 6 +- > drivers/net/mana/rx.c | 24 +- > drivers/net/mana/tx.c | 40 +- > 9 files changed, 1123 insertions(+), 107 deletions(-) > AI review found a few things... Thanks for the v3. Most of the v2 items are addressed: dev_state_qsv is freed via mana_dev_free_resources, reset_thread_active is now RTE_ATOMIC(bool), the joiner owns the flag, pthread mutex/cond use PROCESS_SHARED attrs, intr_handler re-checks state under the lock, the lock hand-off has a clear comment, and a doc + release note patch is included. A few items remain or were introduced by the v2->v3 changes. Errors 1. Deadlock in dev_close_lock + EAGAIN loop on PCI-remove callback. The v3 fix for the discarded unregister return uses a busy-wait: if (dev->device) { do { ret = rte_dev_event_callback_unregister(...); } while (ret == -EAGAIN); } mana_intr_uninstall runs from mana_dev_close, which runs from mana_dev_close_lock with reset_ops_lock held. EAL returns -EAGAIN while cb_lst->active is 1 (callback dispatching). The callback in question is mana_pci_remove_event_cb, which itself blocks on rte_spinlock_lock(&priv->reset_ops_lock). Result: dev_close holds the spinlock, the callback waits for the spinlock, the loop waits for the callback to finish -- three-way deadlock if a hot-remove event arrives during close. Either drop the lock before unregistering and re-take it after, or (better) use a sleeping mutex for reset_ops_lock so the callback can complete while close is suspended. 2. reset_ops_lock is still rte_spinlock_t held across blocking work. Unchanged from v2: the spinlock is held through - mana_reset_enter: rte_rcu_qsbr_check spin, mana_dev_stop, mana_mp_req_on_rxtx (5s IPC timeout), mana_dev_close - mana_reset_exit_delay: ibv_close_device, mana_pci_probe, mana_mp_req_on_rxtx, mana_dev_start Any other ethdev op that hits dev_ops while reset is in flight spins for tens of seconds. Blocking calls under a spinlock are an Error; use a properly-initialised pthread_mutex_t (you already have PROCESS_SHARED attrs handy). Warnings 3. Secondary process: qsbr does not actually quiesce secondary lcores. rte_rcu_qsbr_thread_register is only called from mana_dev_configure, which the secondary never runs. The secondary's rx_burst/tx_burst still call thread_online/offline against the shared qsv, but the secondary tids are unregistered, so they are invisible to rte_rcu_qsbr_check in the primary, and the secondary MP handler (mana_mp_reset_enter) does not call qsbr_check at all -- it just sets db_page = NULL and munmaps. The dev_state check at the top of secondary tx_burst is racy: the page can be munmapped after the in-loop read of db_page but before the doorbell write at the bottom. The "All secondary threads are quiescent" log line in mana_mp_reset_enter is not true. The secondary needs a real reader-side primitive -- its own qsbr with secondary lcore registration, or an rwlock the MP handler takes before munmap. 4. Double-join race between dev_close_lock and mana_reset_enter. dev_close_lock signals + joins the reset thread *before* taking reset_ops_lock. intr_handler may fire in that window: 1. dev_close_lock: pthread_cond_signal, rte_thread_join 2. intr_handler: take lock, see ACTIVE, call reset_enter 3. reset_enter: reset_thread_active still true (step 1 hasn't cleared it yet) -> rte_thread_join on a thread that step 1 just joined -> UB The pre-lock signal/join was added to avoid deadlock with the reset thread holding the lock, which is fine in itself, but the reset_thread_active update needs to be inside that critical section, or reset_enter has to recheck under the lock. 5. mana_dev_close (non-_lock) does not join the reset thread. mana_dev_uninit -> mana_dev_close path doesn't go through dev_close_lock, so if pci_remove arrives while the reset thread is alive, mana_dev_free_resources can destroy reset_cond_mutex / reset_cond before the thread has exited. In practice the pci_remove callback signals RESET_FAILED and the reset thread should exit quickly, but the ordering isn't enforced. Either join in mana_dev_uninit, or document the assumption. Info 6. mana_reset_exit_delay sets priv->ib_ctx = NULL even when ibv_close_device fails: ret = ibv_close_device(priv->ib_ctx); priv->ib_ctx = NULL; if (ret) { ... goto out; } The handle is leaked if close fails. Either keep the old pointer on failure or accept that close-failure is unrecoverable and free it anyway with a comment. Patch 2/2 7. The events list is a term/description pattern and should be a definition list per DPDK RST style: ``RTE_ETH_EVENT_ERR_RECOVERING`` Reset has started. ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` Device has recovered successfully. ``RTE_ETH_EVENT_RECOVERY_FAILED`` Recovery failed. 8. Convention is to fold doc + release note updates into the same commit as the feature. Patches 1/2 and 2/2 can be squashed.

