02/06/2026 15:53, Stephen Hemminger: > On Fri, 29 May 2026 15:34:00 +0200 > Thomas Monjalon <[email protected]> wrote: > > > From: Gregory Etelson <[email protected]> > > > > Selective Rx may save some PCI bandwidth. > > Implement selective Rx in the (quite slow) scalar SPRQ Rx path > > mlx5_rx_burst() where the performance impact > > of the added condition branches is acceptable. > > Other Rx functions do not support this feature. > > When using selective Rx, mlx5_rx_burst will be selected. > > > > A null Memory Region (MR) is always allocated > > at shared device context initialization. > > The selective Rx capability is not advertised > > if this special MR allocation fails. > > > > For each Rx segment configured with a NULL mempool, > > a "null mbuf" is created. > > It is a fake mbuf allocated outside any mempool, > > used as a placeholder in the Rx ring. > > The null MR lkey is used in the WQE for these segments > > so the NIC writes received data to a discard buffer. > > The mbuf data room size is resolved from the first segment having a pool. > > For null segments, the buffer length is from the last seen pool, > > so that the WQE stride size remains consistent. > > > > In mlx5_rx_burst, discarded segments are not chained > > into the packet mbuf list, NB_SEGS is decremented accordingly, > > and no replacement buffer is allocated. > > A separate data_seg_len accumulator tracks the total length > > of delivered segments only. > > The packet length is adjusted to reflect only the data > > actually delivered to the application. > > > > Signed-off-by: Gregory Etelson <[email protected]> > > Signed-off-by: Thomas Monjalon <[email protected]> > > --- > > AI review with Opus 4.8 and High setting found one issue: > > Patch 6: net/mlx5: support selective Rx > > Error: NULL pointer dereference when the first configured Rx segment is a > discard segment (mp == NULL). > > In mlx5_rx_burst() the head mbuf and the chain tail are tracked like this: > > if (pkt) { > if (rep->pool) > NEXT(tail) = rep; > else > --NB_SEGS(pkt); > } > ... > if (seg->pool) { > tail = seg; > ... > } > > tail is only ever assigned inside "if (seg->pool)", and pkt is set to the > first processed segment unconditionally (pkt = seg in the !pkt block, no > pool guard). So if the first segment of a packet is a discard segment: > > pkt becomes the null_mbuf (pool == NULL), tail stays NULL; > on the next (real) segment, rep->pool is set, so NEXT(tail) = rep > executes with tail == NULL -> write through NULL. > > Even without the crash, returning the pool-less null_mbuf as the packet > head is wrong: the application later frees it back to a NULL pool. > > This is reachable, not theoretical. testpmd (patch 3) inserts a leading > mp==NULL segment whenever the first offset is > 0 (seg_offset > next_offset > with next_offset starting at 0), ethdev check_split (patch 2) now permits a > leading NULL mp, and mlx5_rxq_new() accepts it (first_mp is just the first > non-NULL pool; there is no requirement that rxseg[0].mp != NULL). The DTS > cases selective_rx_payload_only (rxoffs=[34]) and selective_rx_two_segments > (rxoffs=[14,...]) in patch 10 configure exactly this layout, and > mlx5_selective_rx_enabled() forces the scalar mlx5_rx_burst path, so the > buggy path is the one that runs. > > Trace for rxoffs=34 / rxpkts=payload (segments: discard[0,34) real[34,290) > discard[290,max)): > > iter0 (discard head): pkt == NULL, seg->pool == NULL -> pkt = null_mbuf, > tail not set; len(290) > DATA_LEN(34) -> ++NB_SEGS, continue. > iter1 (real seg): pkt set, rep->pool != NULL -> NEXT(tail==NULL)=rep. > > Suggested fix: a discard segment must not become the packet head/tail. > Either reject rxseg[0].mp == NULL in mlx5_rxq_new() (cleanest, matches the > "deliver last N bytes" case being unsupported here), or make the data path > skip leading discard segments without assigning them to pkt and only set > pkt/tail on the first segment with a pool. If leading discard is intended > to be supported, the head selection and NEXT(tail) linking both need to > account for tail == NULL. > > The same head/tail assumption also means a packet that falls entirely > within a leading discard segment would be returned with a NULL-pool head; > fixing the above covers that too.
Yes sorry for missing that. I fix it by assigning packet after the first real segment. I take this opportunity to add few comments on the branching conditions.

