Hi Dariusz, Sure!
#0 0x0000000001c34912 in __rte_pktmbuf_free_extbuf () #1 0x0000000001c36a10 in rte_pktmbuf_detach () #2 0x0000000001c4a9ec in rxq_copy_mprq_mbuf_v () #3 0x0000000001c4d63b in rxq_burst_mprq_v () #4 0x0000000001c4d7a7 in mlx5_rx_burst_mprq_vec () #5 0x000000000050be66 in rte_eth_rx_burst () #6 0x000000000050c53d in pkt_burst_io_forward () #7 0x00000000005427b4 in run_pkt_fwd_on_lcore () #8 0x000000000054289b in start_pkt_forward_on_core () #9 0x0000000000a473c9 in eal_thread_loop () #10 0x00007ffff60061ca in start_thread () from /lib64/libpthread.so.0 #11 0x00007ffff5c72e73 in clone () from /lib64/libc.so.6 I've raised the bugs as instructed (ID 1776, 1777, 1778 and 1779) and included the stack trace there too. With regards, Joni On Wed, Aug 20, 2025 at 8:04 PM Dariusz Sosnowski <dsosnow...@nvidia.com> wrote: > Hi, > > On Wed, Aug 20, 2025 at 04:40:16PM +0800, Joni wrote: > > Hi, > > > > I hope this is the correct place to report these issues since it seems to > > be related to DPDK codes. I've reported this to Nvidia a few days ago but > > have yet to receive any response from them. > > > > My server is currently using ConnectX5 MT27800 (mlx5_core 5.7-1.0.2) on > > firmware 16.35.4506 (MT_0000000011). My DPDK library version is 22.11. > > > > I ran the following testpmd command which resulted in segmentation fault > (I > > am currently running on filtered traffic with packets >1000 bytes to > > increase the odds of hitting the segmentation fault): > > > > dpdk-testpmd -l 1-5 -n 4 -a > > > 0000:1f:00.0,rxq_comp_en=1,rxq_pkt_pad_en=1,rxqs_min_mprq=1,mprq_en=1,mprq_log_stride_num=6,mprq_log_stride_size=9,mprq_max_memcpy_len=64,rx_vec_en=1 > > -- -i --rxd=8192 --max-pkt-len=1700 --rxq=1 --total-num-mbufs=16384 > > --mbuf-size=3000 --enable_drop_en –enable_scatter > > > > This segmentation fault goes away when I disable vectorization > > (rx_vec_en=0). (Note that the segmentation fault does not occur in > > forward-mode=rxonly). The segmentation fault also seems to happen with > > higher chances when there is a rxnombuf. > > Thank you for reporting and for the analysis. > > Could you please open a bug on https://bugs.dpdk.org/ with all the > details? > > Do you happen to have a stack trace from the segmentation fault? > > Slava: Could you please take a look at the issue described by Joni in this > mail? > > > > > Upon some investigation, I noticed that in DPDK’s source codes > > drivers/net/mlx5/mlx5_rxtx_vec.c > > (function rxq_copy_mprq_mbuf_v()), there is a possibility where the > > consumed stride exceeds the stride number (64 in this case) which should > > not be happening. I'm suspecting there's some CQE misalignment here upon > > encountering rxnombuf. > > > > rxq_copy_mprq_mbuf_v(...) { > > ... > > if(rxq->consumed_strd == strd_n) { > > // replenish WQE > > } > > ... > > strd_cnt = (elts[i]->pkt_len / strd_sz) + > > ((elts[i]->pkt_len % strd_sz) ? 1 : 0); > > > > rxq_code = mprq_buf_to_pkt(rxq, elts[i], elts[i]->pkt_len, buf, > > rxq->consumed_strd, strd_cnt); > > rxq->consumed_strd += strd_cnt; // encountering cases where > > rxq->consumed_strd > strd_n > > ... > > } > > > > In addition, there were also cases in mprq_buf_to_pkt() where the > allocated > > seg address is exactly the same as the pkt (elts[i]) address passed in > > which should not happen. > > > > mprq_buf_to_pkt(...) { > > ... > > if(hdrm_overlap > 0) { > > MLX5_ASSERT(rxq->strd_scatter_en); > > > > struct rte_mbuf *seg = rte_pktmbuf_alloc(rxq->mp); > > if (unlikely(seg == NULL)) return MLX5_RXQ_CODE_NOMBUF; > > SET_DATA_OFF(seg, 0); > > > > // added debug statement > > DRV_LOG(DEBUG, "pkt %p seg %p", (void *)pkt, (void *)seg); > > > > rte_memcpy(rte_pktmbuf_mtod(seg, void *), RTE_PTR_ADD(addr, len - > > hdrm_overlap), hdrm_overlap); ... } } > > > > I have tried upgrading my DPDK version to 24.11 but the segmentation > fault > > still persists. > > > > In addition, there were also a few other issues that I've noticed: > > > > - max-pkt-len does not seem to work for values < 1500 even though > "show > > port info X" showed that the MTU was set to the value I've passed in > > - In mprq_buf_to_pkt(): > > - uint32_t seg_len = RTE_MIN(len, (uint32_t)(pkt->buf_len - > > RTE_PKTMBUF_HEADROOM)) --> seems unnecessary as to hit this code, len > has > > to be greater than (uint32_t)(pkt->buf_len - RTE_PKTMBUF_HEADROOM) > due to > > the if condition > > - If the allocation struct rte_mbuf *next = > > rte_pktmbuf_alloc(rxq->mp) fails and packet has more than 2 segs, the > segs > > that were allocated previously do not get freed > > > > mprq_buf_to_pkt(...) { > > ... } else if (rxq->strd_scatter_en) { > > > > struct rte_mbuf *prev = pkt; > > > > uint32_t seg_len = RTE_MIN(len, (uint32_t) > > > > (pkt->buf_len - RTE_PKTMBUF_HEADROOM)); > > > > uint32_t rem_len = len - seg_len; > > > > > > rte_memcpy(rte_pktmbuf_mtod(pkt, void *), addr, seg_len); > > DATA_LEN(pkt) = seg_len; > > while (rem_len) { > > struct rte_mbuf *next = rte_pktmbuf_alloc(rxq->mp); > > > > > > if (unlikely(next == NULL)) > > return MLX5_RXQ_CODE_NOMBUF; > > ... > > - In the external buffer attach case where hdrm_overlap > 0, the code > > did not decrement the buffer refcnt if allocation struct rte_mbuf *next = > > rte_pktmbuf_alloc(rxq->mp) fails > > > > mprq_buf_to_pkt(...) { > > ... if (hdrm_overlap > 0) { > > > > __atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_RELAXED); > > ... > > MLX5_ASSERT(rxq->strd_scatter_en); > > struct rte_mbuf *seg = rte_pktmbuf_alloc(rxq->mp); > > if (unlikely(seg == NULL)) > > return MLX5_RXQ_CODE_NOMBUF; > > SET_DATA_OFF(seg, 0); > > ... > > > > > > Hope to hear from you soon! > > > > With regards, > > Joni > > Best regards, > Dariusz Sosnowski >