On Thu, 18 Jun 2026 10:44:35 -0400 Dawid Wesierski <[email protected]> wrote:
> From: Marek Kasiewicz <[email protected]> > > This series introduces several improvements to Intel iavf and ice > drivers, including a new ethdev API for header split mbuf callbacks, > increased ring descriptors, and improved PTP timestamping. > > Marek Kasiewicz (7): > ethdev: add header split mbuf callback API > net/iavf: increase max ring descriptors to hardware limit > net/iavf: allow runtime queue rate limit configuration > net/ice/base: reduce default scheduler burst size > net/ice: timestamp all received packets when PTP is enabled > net/iavf: disable runtime queue setup capability > net/intel: support header split mbuf callback > > drivers/net/intel/common/rx.h | 2 + > drivers/net/intel/iavf/iavf_ethdev.c | 3 -- > drivers/net/intel/iavf/iavf_rxtx.h | 2 +- > drivers/net/intel/iavf/iavf_tm.c | 11 ++-- > drivers/net/intel/ice/base/ice_type.h | 2 +- > drivers/net/intel/ice/ice_ethdev.c | 1 + > drivers/net/intel/ice/ice_rxtx.c | 72 ++++++++++++++++++++++++--- > drivers/net/intel/ice/ice_rxtx.h | 2 + > lib/ethdev/ethdev_driver.h | 15 +++++++ > lib/ethdev/rte_ethdev.c | 51 ++++++++++++++++++++++ > lib/ethdev/rte_ethdev.h | 7 +++ > 11 files changed, 153 insertions(+), 15 deletions(-) > AI review found lots of problems with the implementation as well. Series composition This v2 series bundles seven patches under one cover letter, but only 1/7 (ethdev API) and 7/7 (net/intel) implement the header-split mbuf callback. Patches 2/7 (iavf max ring desc), 3/7 (iavf queue rate limit), 4/7 (ice base burst size), 5/7 (ice PTP timestamp), and 6/7 (iavf runtime queue setup) are unrelated driver changes. Please split these into a separate series so each can be reviewed and merged on its own merits; mixing them obscures the dependency and complicates bisect. [PATCH v2 1/7] ethdev: add header split mbuf callback API Warning: struct rte_eth_hdrs_mbuf carries only buf_addr and buf_iova, no length. The callback therefore cannot tell the PMD how large the application buffer is. In buffer-split mode the NIC DMAs up to the queue's configured payload buffer size; if the application buffer is smaller, the hardware writes past it. Add a length field and have the PMD (and/or ethdev) validate it against the configured payload size. Warning: the buffer lifetime and ownership are undocumented. The Doxygen does not state who owns the buffer after the payload mbuf is freed, whether the address is consumed as-is or with headroom applied (the 7/7 implementation adds RTE_PKTMBUF_HEADROOM to the IOVA -- see below), or that the callback must return a distinct buffer per call. These are contract details an application cannot guess. Warning: new experimental API with no testpmd hook and no functional test in app/test. Both are required for new ethdev API. Warning: new experimental public API but no release notes entry (doc/guides/rel_notes/release_26_07.rst). The series touches only the three code files. Info: rte_eth_hdrs_set_mbuf_callback() validates port_id and the hdrs_mbuf_set_cb hook, but passes rx_queue_id to the PMD without checking it against dev->data->nb_rx_queues. Sibling per-queue ethdev calls (e.g. rte_eth_rx_queue_setup) validate the queue id at the ethdev layer. The ICE PMD does check it in 7/7, but validating in the wrapper would be consistent and would protect future PMDs that forget. [PATCH v2 7/7] net/intel: support header split mbuf callback Error: overwriting buf_addr/buf_iova on a pool mbuf corrupts the payload mempool. At all three sites (ice_alloc_rx_queue_mbufs, ice_rx_alloc_bufs, ice_recv_pkts) the payload mbuf is allocated from rxq->rxseg[1].mp and then has its buf_addr/buf_iova rewritten to point at application memory: if (ret >= 0) { mbuf_pay->buf_addr = hdrs_mbuf.buf_addr; mbuf_pay->buf_iova = hdrs_mbuf.buf_iova; } DPDK never restores buf_addr/buf_iova on free. When these mbufs are returned to rxseg[1].mp (via rte_pktmbuf_free of the received chain, or _ice_rx_queue_release_mbufs on queue stop), the pool objects retain the foreign buffer pointer, and buf_len still describes the original mempool buffer. The next rte_mbuf_raw_alloc()/raw_alloc_bulk() from that pool hands back an mbuf pointing at application memory. The correct mechanism for pointing an mbuf at non-pool memory is rte_pktmbuf_attach_extbuf() (sets buf_addr/buf_iova/buf_len, the EXTERNAL flag, and a free callback), or a dedicated external-buffer mempool. As written the payload pool is progressively poisoned. Error: the callback's failure return is swallowed. On ret < 0 the code skips the address update but still programs the descriptor from the mbuf's current buf_iova: rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay)); Because of the pool corruption above, a recycled mbuf's buf_iova may point at a buffer the application already reclaimed, so a failed callback silently arms the NIC to DMA into stale/foreign memory. A callback failure should abort the refill (or fall back to a known-good pool buffer), not fall through. Separately, the documented contract is "0 on success, negative on failure"; the test should be ret == 0, not ret >= 0. Warning: RTE_PKTMBUF_HEADROOM is silently applied to the supplied IOVA. The descriptor is programmed with rte_mbuf_data_iova_default(), i.e. buf_iova + data_off (128), but the API documents buf_iova as "the IOVA of the payload buffer". The NIC therefore writes 128 bytes into the application buffer, and the tail of a buffer sized to the payload overruns by the headroom. Either document that the headroom offset is applied, or program pkt_addr from the raw buf_iova for callback- provided buffers.

