On Thu, 18 Jun 2026 10:44:35 -0400
Dawid Wesierski <[email protected]> wrote:

> From: Marek Kasiewicz <[email protected]>
> 
> This series introduces several improvements to Intel iavf and ice
> drivers, including a new ethdev API for header split mbuf callbacks,
> increased ring descriptors, and improved PTP timestamping.
> 
> Marek Kasiewicz (7):
>   ethdev: add header split mbuf callback API
>   net/iavf: increase max ring descriptors to hardware limit
>   net/iavf: allow runtime queue rate limit configuration
>   net/ice/base: reduce default scheduler burst size
>   net/ice: timestamp all received packets when PTP is enabled
>   net/iavf: disable runtime queue setup capability
>   net/intel: support header split mbuf callback
> 
>  drivers/net/intel/common/rx.h         |  2 +
>  drivers/net/intel/iavf/iavf_ethdev.c  |  3 --
>  drivers/net/intel/iavf/iavf_rxtx.h    |  2 +-
>  drivers/net/intel/iavf/iavf_tm.c      | 11 ++--
>  drivers/net/intel/ice/base/ice_type.h |  2 +-
>  drivers/net/intel/ice/ice_ethdev.c    |  1 +
>  drivers/net/intel/ice/ice_rxtx.c      | 72 ++++++++++++++++++++++++---
>  drivers/net/intel/ice/ice_rxtx.h      |  2 +
>  lib/ethdev/ethdev_driver.h            | 15 +++++++
>  lib/ethdev/rte_ethdev.c               | 51 ++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h               |  7 +++
>  11 files changed, 153 insertions(+), 15 deletions(-)
> 

AI review found lots of problems with the implementation as well.

Series composition

This v2 series bundles seven patches under one cover letter, but only
1/7 (ethdev API) and 7/7 (net/intel) implement the header-split mbuf
callback. Patches 2/7 (iavf max ring desc), 3/7 (iavf queue rate
limit), 4/7 (ice base burst size), 5/7 (ice PTP timestamp), and 6/7
(iavf runtime queue setup) are unrelated driver changes. Please split
these into a separate series so each can be reviewed and merged on its
own merits; mixing them obscures the dependency and complicates bisect.


[PATCH v2 1/7] ethdev: add header split mbuf callback API

Warning: struct rte_eth_hdrs_mbuf carries only buf_addr and buf_iova,
no length. The callback therefore cannot tell the PMD how large the
application buffer is. In buffer-split mode the NIC DMAs up to the
queue's configured payload buffer size; if the application buffer is
smaller, the hardware writes past it. Add a length field and have the
PMD (and/or ethdev) validate it against the configured payload size.

Warning: the buffer lifetime and ownership are undocumented. The
Doxygen does not state who owns the buffer after the payload mbuf is
freed, whether the address is consumed as-is or with headroom applied
(the 7/7 implementation adds RTE_PKTMBUF_HEADROOM to the IOVA -- see
below), or that the callback must return a distinct buffer per call.
These are contract details an application cannot guess.

Warning: new experimental API with no testpmd hook and no functional
test in app/test. Both are required for new ethdev API.

Warning: new experimental public API but no release notes entry
(doc/guides/rel_notes/release_26_07.rst). The series touches only the
three code files.

Info: rte_eth_hdrs_set_mbuf_callback() validates port_id and the
hdrs_mbuf_set_cb hook, but passes rx_queue_id to the PMD without
checking it against dev->data->nb_rx_queues. Sibling per-queue ethdev
calls (e.g. rte_eth_rx_queue_setup) validate the queue id at the
ethdev layer. The ICE PMD does check it in 7/7, but validating in the
wrapper would be consistent and would protect future PMDs that forget.


[PATCH v2 7/7] net/intel: support header split mbuf callback

Error: overwriting buf_addr/buf_iova on a pool mbuf corrupts the
payload mempool. At all three sites (ice_alloc_rx_queue_mbufs,
ice_rx_alloc_bufs, ice_recv_pkts) the payload mbuf is allocated from
rxq->rxseg[1].mp and then has its buf_addr/buf_iova rewritten to point
at application memory:

        if (ret >= 0) {
                mbuf_pay->buf_addr = hdrs_mbuf.buf_addr;
                mbuf_pay->buf_iova = hdrs_mbuf.buf_iova;
        }

DPDK never restores buf_addr/buf_iova on free. When these mbufs are
returned to rxseg[1].mp (via rte_pktmbuf_free of the received chain,
or _ice_rx_queue_release_mbufs on queue stop), the pool objects retain
the foreign buffer pointer, and buf_len still describes the original
mempool buffer. The next rte_mbuf_raw_alloc()/raw_alloc_bulk() from
that pool hands back an mbuf pointing at application memory. The
correct mechanism for pointing an mbuf at non-pool memory is
rte_pktmbuf_attach_extbuf() (sets buf_addr/buf_iova/buf_len, the
EXTERNAL flag, and a free callback), or a dedicated external-buffer
mempool. As written the payload pool is progressively poisoned.

Error: the callback's failure return is swallowed. On ret < 0 the code
skips the address update but still programs the descriptor from the
mbuf's current buf_iova:

        rxdp->read.pkt_addr =
                rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));

Because of the pool corruption above, a recycled mbuf's buf_iova may
point at a buffer the application already reclaimed, so a failed
callback silently arms the NIC to DMA into stale/foreign memory. A
callback failure should abort the refill (or fall back to a known-good
pool buffer), not fall through. Separately, the documented contract is
"0 on success, negative on failure"; the test should be ret == 0, not
ret >= 0.

Warning: RTE_PKTMBUF_HEADROOM is silently applied to the supplied
IOVA. The descriptor is programmed with rte_mbuf_data_iova_default(),
i.e. buf_iova + data_off (128), but the API documents buf_iova as "the
IOVA of the payload buffer". The NIC therefore writes 128 bytes into
the application buffer, and the tail of a buffer sized to the payload
overruns by the headroom. Either document that the headroom offset is
applied, or program pkt_addr from the raw buf_iova for callback-
provided buffers.

Reply via email to