RE: [PATCH] net/i40e: Fast release optimizations

Morten Brørup Tue, 01 Jul 2025 02:09:07 -0700

> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com]
> Sent: Tuesday, 1 July 2025 10.16


[...]

> I am talking about different thing:
> I think with some extra effort driver can use (in some cases)
> rte_mbuf_raw_free_bulk()  even  when   RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> is not specified.
> Let say we can make txq->fast_free_mp[] an array with the same size as txq-
> >txep[].
> At tx_burst() when filling txep[] we can do pre_free() checks for that mbuf,
> and in case of success store it's mempool pointer in corresponding  txq-
> >fast_free_mp[],
> otherwise put NULL there.
> Then at tx_free() we can scan fast_free_mp[] and invoke   raw_free() for non-
> NULL entries.
> Again, for now it is just an idea probably worth to think about.

Yes, that seems like an excellent idea, certainly worth considering!

At tx_free(), the mbufs might be cold, so not accessing them at this point 
improves performance. (Which is also the point of my patch.)

At tx_burst(), the mbufs are read anyway (their information is written into the 
tx descriptors), so the mbufs are hot in the cache at this point.

Best case with your suggestion, rte_pktmbuf_prefree_seg() doesn't write the 
mbuf, so the performance cost of doing it at tx_burst() is extremely low.

Worst case with your suggestion, rte_pktmbuf_prefree_seg() does write the mbuf, 
so the mbuf write operation simply moves from tx_free() to tx_burst().
However, in tx_burst(), the mbuf is already hot in the cache, so per 
transmitted mbuf, we get one load+store at tx_burst() instead of one load at 
tx_burst() + one load+store at tx_free().

RE: [PATCH] net/i40e: Fast release optimizations

Reply via email to