[Public]

Hi Morten,

We have tested the effect of the patch using func-latency and PPs via  testpmd.
Please find our observations below

 - DPDK tag: 25.07-rc1
 - compiler: gcc 14.2
 - platform: AMD EPYC 8534P 64core 2.3GHz
 - app cmd:
 -- One port: ` sudo build/app/dpdk-testpmd -l 15,16  --vdev=net_null1 --no-pci 
 -- --nb-cores=1 --nb-ports=1 --txq=1 --rxq=1 --txd=2048 --rxd=2048  -a 
--forward-mode=io --stats-period=1`
 -- Two port: ` sudo build/app/dpdk-testpmd -l 15,16,17  --vdev=net_null1 
--vdev=net_null2 --no-pci  -- --nb-cores=2 --nb-ports=2 --txq=1 --rxq=1 
--txd=2048 --rxd=2048  -a --forward-mode=io --stats-period=1`

Result 1 port:
 - Before patch: TX MPPs 117.61, RX-PPs 117.67, Func-latency TX: 1918ns, 
Func-latency free-bulk: 2667ns
 - After patch: TX MPPs 117.55, RX-PPs 117.54, Func-latency TX: 1921ns, 
Func-latency free-bulk: 2660ns

Result 2 port:
 - Before patch: TX MPPs 117.61, RX-PPs 117.67, Func-latency TX: 1942ns, 
Func-latency free-bulk: 2557ns
 - After patch: TX MPPs 117.54, RX-PPs 117.54, Func-latency TX: 1946ns, 
Func-latency free-bulk: 2740ns

Perf Top: diff before vs after shows 13.84% vs 13.79%

Reviewed-by: Thiyagarjan P <thiyagaraja...@amd.com>
Tested-by: Vipin Varghese <vipin.vargh...@amd.com>

Clarification request: `with fast-mbuf-free on single port we see free-bulk 
reduction by -7ns. But null_tx increase by +3ns. TX PPs reduction by 0.07 Mpps. 
Is this anomaly of null_net PMD?`

> >
> > On Tue, 24 Jun 2025 18:14:16 +0000
> > Morten Brørup <m...@smartsharesystems.com> wrote:
> >
> > > Added fast mbuf release, re-using the existing mbuf pool pointer in
> > > the queue structure.
> > >
> > > Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
> >
> > Makes sense.
> >
> > > ---
> > >  drivers/net/null/rte_eth_null.c | 30 +++++++++++++++++++++++++++---
> > >  1 file changed, 27 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/null/rte_eth_null.c
> > b/drivers/net/null/rte_eth_null.c
> > > index 8a9b74a03b..12c0d8d1ff 100644
> > > --- a/drivers/net/null/rte_eth_null.c
> > > +++ b/drivers/net/null/rte_eth_null.c
> > > @@ -34,6 +34,17 @@ struct pmd_internals;  struct null_queue {
> > >     struct pmd_internals *internals;
> > >
> > > +   /**
> > > +    * For RX queue:
> > > +    *  Mempool to allocate mbufs from.
> > > +    *
> > > +    * For TX queue:
> > > +    *  Mempool to free mbufs to, if fast release of mbufs is enabled.
> > > +    *  UINTPTR_MAX if the mempool for fast release of mbufs has not
> > yet been detected.
> > > +    *  NULL if fast release of mbufs is not enabled.
> > > +    *
> > > +    *  @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> > > +    */
> > >     struct rte_mempool *mb_pool;
> >
> > Do all drivers to it this way?
>
> No, I think most drivers have separate structures for rx and tx queues. This 
> driver
> doesn't so I'm reusing the existing mempool pointer.
> Also, they don't cache the mempool pointer, but look at mbuf[0].pool at every 
> burst;
> so their tx queue structure doesn't have a mempool pointer field.
> And they check an offload flag (either the bit in the raw offload field, or a 
> shadow
> variable for the relevant offload flag), instead of checking the mempool 
> pointer.
>
> Other drivers can be improved, and I have submitted an optimization patch for 
> the
> i40e driver with some of the things I do in this patch:
> https://inbox.dpdk.org/dev/20250624061238.89259-1-
> m...@smartsharesystems.com/
>
> > Is it documented in ethdev?
>
> The RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE flag is documented.
> How to implement it in the drivers is not.
>
> -Morten

Reply via email to