RE: [EXTERNAL] Re: [PATCH v4 0/7] fix multi-process VF hotplug

Long Li Thu, 26 Feb 2026 17:03:08 -0800

> Looks to be in good shape, you may want to address these errors flagged by AI
> review.
> 
> Critical Finding Summary
> Patch Severity        Issue
> 5, 6, 7       Error   START_RXTX handler does not restore rte_eth_fp_ops burst
> function pointers after STOP clears them — secondary Rx/Tx broken after
> STOP+START cycle


Will fix this. The v1 version is correct but this was introduced in later 
versions.

> 3     Warning PD deallocation order: parent freed before child may
> cause EBUSY; reverse the order

This is a fault alarm. Parent PD should be deleted first.

> 1     Warning switch_data_path failure leaves VF attached/started but
> unused

This is by design and continue with the existing behavior.

> 1     Warning eth_hn_dev_init ignores hn_vf_add_unlocked error (pre-
> existing)

This is by design and continue with the existing behavior.

I'm sending v5 for fixing 5, 6, 7.

Long

> 2     Info    secondary_cnt atomics use acquire/release where relaxed
> suffices
> 
> 
> # DPDK VF Hotplug Patch Series v4 Review
> 
> **Series**: [PATCH v4 1/7] through [PATCH v4 7/7]
> **Author**: Long Li <[email protected]> **Reviewed against**: AGENTS.md
> DPDK Code Review Guidelines
> 
> ---
> 
> ## Summary
> 
> This series addresses race conditions and resource management issues in the
> netvsc/mana/mlx5/mlx4 drivers around VF hotplug, multi-process coordination,
> and fast-path ops setup. The locking rework in patch 1 is substantial and 
> well-
> structured. The multi-process infrastructure in patch 2 is a significant 
> addition.
> Commit messages throughout are excellent — clear root cause analysis, well-
> documented change rationale, and honest notes about limitations.
> 
> Subject lines are all within 60 characters, Fixes tags all have 12-character 
> SHAs,
> tag ordering is correct throughout, and commit body line wrapping is clean.
> 
> ---
> 
> ## Patch 1: net/netvsc: fix race conditions on VF add/remove events
> 
> ### Correctness analysis
> 
> The locking redesign is solid:
> 
> - **Rx/Tx fast path**: Moving the lock acquisition outside the 
> `vf_vsc_switched`
> check eliminates the TOCTOU race where the flag was checked without the lock,
> then the VF could be removed before the lock was acquired. The old double-
> check pattern was racy.
> 
> - **`hn_vf_close`**: Upgrading from `rte_rwlock_read_lock` to
> `rte_rwlock_write_lock` is correct — the function modifies `vf_attached`, 
> which is
> a write operation.
> 
> - **`hn_vf_detach`/`fresh_attach` pattern**: On `-EEXIST` from `hn_vf_attach`,
> the `fresh_attach = false` flag correctly prevents `hn_vf_detach` from tearing
> down a previously valid VF attachment on configure/start failure. This is a 
> subtle
> and well-handled case.
> 
> - **Callback registration moved to `hn_vf_attach`**: Registering the RMV
> callback immediately on attach (with rollback on failure) is cleaner than the 
> old
> placement in `hn_vf_configure`.
> 
> ### Warning: `hn_vf_add_unlocked` does not clean up VF on `switch_data_path`
> failure
> 
> If `hn_nvs_set_datapath(NVS_DATAPATH_VF)` fails at the `switch_data_path`
> label, the function returns the error but leaves the VF attached and possibly
> started. The VF sits running but unused (traffic goes through synthetic 
> path). The
> self-healing retry mechanism (re-entry through `hn_nvs_handle_vfassoc` or
> `netvsc_hotplug_retry`) would eventually retry, so this is not a crash bug, 
> but a
> `goto detach` on data path switch failure would be cleaner and avoid the VF
> running in a wasted state.
> 
> ### Warning: `eth_hn_dev_init` silently ignores `hn_vf_add_unlocked` error
> 
> ```c
> rte_rwlock_write_lock(&hv->vf_lock);
> if (hv->vf_ctx.vf_vsp_reported && !hv->vf_ctx.vf_vsc_switched) {
>     PMD_INIT_LOG(DEBUG, "Adding VF device");
>     err = hn_vf_add_unlocked(eth_dev, hv); } rte_rwlock_write_unlock(&hv-
> >vf_lock);
> 
> return 0;  /* err is ignored */
> ```
> 
> The `err` from `hn_vf_add_unlocked` is stored but never checked — the function
> always returns 0. This is pre-existing behavior, but worth noting as it could 
> mask
> VF setup failures during init.
> 
> ---
> 
> ## Patch 2: net/netvsc: add multi-process VF device removal support
> 
> ### Correctness analysis
> 
> The multi-process infrastructure is well-designed:
> 
> - **Shared memzone** for `secondary_cnt` with proper atomic operations
> - **Spinlock** protecting the one-time init/uninit sequence
> - **`rte_mp_request_sync`** with timeout for reliable cross-process 
> coordination
> - **ENOTSUP** handling for single-process mode
> 
> ### Info: Memory ordering stronger than necessary
> 
> The `secondary_cnt` atomic uses `rte_memory_order_release` for stores and
> `rte_memory_order_acquire` for loads. Since this counter doesn't guard any 
> other
> shared data (it's just a "should I bother sending MP requests?" optimization),
> `rte_memory_order_relaxed` would suffice. The current ordering is correct but
> unnecessarily strong.
> 
> ### Info: `netvsc_mp_primary_handle` is a stub
> 
> ```c
> static int
> netvsc_mp_primary_handle(const struct rte_mp_msg *mp_msg __rte_unused,
>                           const void *peer __rte_unused) {
>     /* Stub function required for multi-process message handling registration 
> */
>     return 0;
> }
> ```
> 
> The comment explains this is needed for registration. This is fine — the 
> primary
> currently only sends, never receives. If future message types require primary
> handling, this stub is the extension point.
> 
> ---
> 
> ## Patch 3: net/mana: fix PD resource leak on device close
> 
> ### Warning: PD deallocation order may be wrong
> 
> The code frees `ib_parent_pd` before `ib_pd`:
> 
> ```c
> if (priv->ib_parent_pd) {
>     int err = ibv_dealloc_pd(priv->ib_parent_pd);
>     ...
> }
> if (priv->ib_pd) {
>     int err = ibv_dealloc_pd(priv->ib_pd);
>     ...
> }
> ```
> 
> If `ib_pd` is a child/derived PD of `ib_parent_pd`, 
> `ibv_dealloc_pd(ib_parent_pd)`
> will fail with EBUSY because the child still holds a reference. The error is 
> logged
> and the code continues, so `ib_pd` gets freed next, and `ibv_close_device` 
> cleans
> up the orphaned parent PD. This works but is fragile.
> 
> Reversing the order (free `ib_pd` first, then `ib_parent_pd`) would be 
> cleaner and
> avoid the EBUSY error on the parent. If the PDs are independent (no 
> parent-child
> verbs relationship), the order doesn't matter, but the naming strongly 
> suggests a
> hierarchy.
> 
> ---
> 
> ## Patch 4: net/netvsc: fix devargs memory leak on hotplug
> 
> No issues. The `rte_devargs_reset()` calls are correctly placed in both 
> cleanup
> paths before `free(hot_ctx)`.
> 
> ---
> 
> ## Patches 5, 6, 7: net/mana, net/mlx5, net/mlx4: fix fast-path ops setup in
> secondary process
> 
> ### Error: `rte_eth_fp_ops` burst function pointers not updated in START_RXTX
> handler
> 
> All three patches update `rte_eth_fp_ops[].rxq.data` and `txq.data` in the 
> START
> handler, and update `rte_eth_fp_ops[].rx_pkt_burst` and `tx_pkt_burst` in the
> STOP handler. But the START handler does **not** restore the burst function
> pointers.
> 
> After a STOP → START cycle:
> 1. STOP sets `rte_eth_fp_ops[port].rx_pkt_burst = dummy` (newly added by this
> patch) 2. START sets `rte_eth_fp_ops[port].rxq.data = dev->data->rx_queues` 
> but
> does NOT restore `rx_pkt_burst` 3. Secondary's `rte_eth_rx_burst()` calls
> `rte_eth_fp_ops[port].rx_pkt_burst(...)` which is still the dummy → **Rx/Tx
> broken**
> 
> Since `rte_eth_rx_burst()` uses `rte_eth_fp_ops` (process-local), not `dev-
> >rx_pkt_burst` (shared), the secondary cannot receive or transmit after a
> STOP+START cycle even though `dev->rx_pkt_burst` was correctly set.
> 
> This may not manifest on the **first** start (where
> `rte_eth_dev_probing_finish()` sets fp_ops during probe), and it may be 
> masked if
> the VF is re-probed during hot-add. But it will manifest on any in-place
> STOP+START cycle without re-probe.
> 
> **Suggested fix** — add to each START_RXTX handler:
> 
> For patch 5 (mana):
> ```c
> rte_eth_fp_ops[param->port_id].rx_pkt_burst = mana_rx_burst;
> rte_eth_fp_ops[param->port_id].tx_pkt_burst = mana_tx_burst; ```
> 
> For patch 6 (mlx5):
> ```c
> rte_eth_fp_ops[param->port_id].rx_pkt_burst = dev->rx_pkt_burst;
> rte_eth_fp_ops[param->port_id].tx_pkt_burst = dev->tx_pkt_burst; ```
> 
> For patch 7 (mlx4), same as mlx5.
> 
> Place these **before** the `rte_mb()` call so the barrier covers all fp_ops
> updates.
> 
> ---
> 
> ## Critical Finding Summary
> 
> | Patch | Severity | Issue |
> |-------|----------|-------|
> | 5, 6, 7 | **Error** | START_RXTX handler does not restore
> | `rte_eth_fp_ops` burst function pointers after STOP clears them —
> | secondary Rx/Tx broken after STOP+START cycle |
> | 3 | Warning | PD deallocation order: parent freed before child may
> | cause EBUSY; reverse the order |
> | 1 | Warning | `switch_data_path` failure leaves VF attached/started
> | but unused |
> | 1 | Warning | `eth_hn_dev_init` ignores `hn_vf_add_unlocked` error
> | (pre-existing) |
> | 2 | Info | `secondary_cnt` atomics use acquire/release where relaxed
> | suffices |

RE: [EXTERNAL] Re: [PATCH v4 0/7] fix multi-process VF hotplug

Reply via email to