> From: Stephen Hemminger [mailto:[email protected]] > Sent: Friday, 29 May 2026 19.11 > > On a 32-core system the test matrix runs the cartesian product of > 4 mempools, 3 core-count configurations and ~340 (n_keep, bulk) > points at TIME_S=1 second each: about 67 minutes total, well past > the 10 minute perf-test timeout. > > Two reductions, no loss of meaningful signal: > > 1. Per-point duration: 1 second -> 200 ms. Each point currently > collects 10^5-10^6 mempool ops; 200 ms still yields >10^4 > samples, well above the noise floor for a cycles-per-op average.
Ack to this. > > 2. Matrix trim: drop adjacent bulk and n_keep points that don't > produce regime changes. Retained set covers the boundaries > that matter: 1, 4, cache-line burst (8), typical packet burst > (32) and cache size (RTE_MEMPOOL_CACHE_MAX_SIZE = 512) for bulk; > 32 (fits in cache), 512 (= cache size) and 32768 (far exceeds > cache) for n_keep. My mempool optimization patch [1] introduces a bounce buffer limit, so huge requests are not needlessly copied twice to bounce through the cache, but are moved directly between application memory and the mempool backend driver. The bounce buffer limit is half the cache size. So, please keep 256. Maybe change it to RTE_MEMPOOL_CACHE_MAX_SIZE / 2. [1]: https://patchwork.dpdk.org/project/dpdk/patch/[email protected]/ Also consider keeping 64; it seems to be a popular default burst size for some CPUs. On the other hand, if the patch introducing default mbuf burst sizes [2] gets accepted, we could replace 32 with RTE_MBUF_BURST_SIZE_THROUGHPUT and 4 with RTE_MBUF_BURST_SIZE_LATENCY. No strong opinion on 64; I'll leave that up to you. [2]: https://patchwork.dpdk.org/project/dpdk/list/?series=37914 > > Combined effect: ~10x runtime reduction. > > Signed-off-by: Stephen Hemminger <[email protected]> With suggested changes, Acked-by: Morten Brørup <[email protected]>

