>>> From: Pavan Nikhilesh <[email protected]> >>> >>> Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the >>> optimal burst size. >>> >>> Set default value to 64 for soc_cn10k and 32 generally. >>> >>> Signed-off-by: Pavan Nikhilesh <[email protected]> >>> --- >>> This improves performance by 5% on l2fwd, other examples showed >>> negligible difference on CN10K. >>> >> >>I support the concept of having a recommended mbuf burst size, targeting the >>majority of generic applications. >>Making it CPU dependent seems like a good choice. >> >>It should be named differently. >>First of all, "optimal" depends on the use case; if targeting low latency, >>shorter bursts are better, so "OPTIMAL" should not be part of the name. >>Second, I would guess that it only targets mbuf bursts, not also bursts of >>other operations (e.g. hash lookups), so "MBUF" should be part of the name. >> >>Suggestion: >>/* Recommended burst size for generic applications, striking a balance >>between throughput and latency. */ >>dpdk_conf.set('RTE_MBUF_BURST_SIZE_MAX' (or _DEFAULT), 64) >> > >Agreed, would the comment be enough to say that it is a recommendation and not >an enforcement? or should it be added to the macro name? >I am sceptical of changing burst size of 64 since most of the applications >_today_ use 32, might cause unintended regression. > >RTE_MBUF_BURST_SIZE_(REC)_PERF? > >><feature creep> >>/* Recommended burst size for generic applications targeting low latency. */ >>dpdk_conf.set('RTE_MBUF_BURST_SIZE_MIN', 4) >></feature creep> > >RTE_MBUF_BURST_SIZE_(REC)_LAT? > >(I am bad at names) >> >>Having these standardized will also allow libraries and drivers to optimize >>for them, e.g. drivers should support bursts sizes all the way down to >>RTE_MBUF_BURST_SIZE_MIN, and can static_assert() that the >>RTE_MBUF_BURST_SIZE_MIN is not lower than supported by the driver/hardware. >> >><more feature creep> >>rte_config.h could have "#define RTE_MBUF_BURST_SIZE >>RTE_MBUF_BURST_SIZE_MAX", for the application developer to change to >>RTE_MBUF_BURST_SIZE_MIN for low latency applications. >>This will let the libraries and drivers optimize for the specific burst size >>used by the application. >></more feature creep> > >This is fine with me, we can wrap it around a meson option to avoid manually >changing rte_config.h > >> >><rambling> >>Intuitively, I would assume that the optimal burst size essentially depends >>on the CPU's L1D cache size and the application's number of non-mbuf cache >>lines accessed per burst. >>Let's say a CPU core has 32 KiB cache (= 512 cache lines), and each burst >>touches 4 cache lines per packet: >>2 cache lines for the mbuf >>1 cache line for the packet data >>1 cache line per packet for some table lookup/forwarding entry >> >>Then the mbuf burst should be max 512/4 = 128. >>But local variables also use memory during processing, so using a burst of 64 >>would leave room for that and some more. >></rambling> > >We could probably read `/sys/devices/system/cpu/cpu0/cache/index0/size` in >meson and calculate the number of lines and burst but, I dont think its >that simple, for example, CN10K has 64KiB L1D cache and anything above 64 >burst size causes performance loss. > >Thanks, >Pavan >
@Morten, Any further thoughts? I can spin up a v1.

