On a 32-core system the test matrix runs the cartesian product of
4 mempools, 3 core-count configurations and ~340 (n_keep, bulk)
points at TIME_S=1 second each: about 67 minutes total, well past
the 10 minute perf-test timeout.
Two reductions, no loss of meaningful signal:
1. Per-point duration: 1 second -> 200 ms. Each point currently
collects 10^5-10^6 mempool ops; 200 ms still yields >10^4
samples, well above the noise floor for a cycles-per-op average.
2. Matrix trim: drop adjacent bulk and n_keep points that don't
produce regime changes. Retained set covers the boundaries
that matter: 1, 4, cache-line burst (8), typical packet burst
(32) and cache size (RTE_MEMPOOL_CACHE_MAX_SIZE = 512) for bulk;
32 (fits in cache), 512 (= cache size) and 32768 (far exceeds
cache) for n_keep.
Combined effect: ~10x runtime reduction.
Signed-off-by: Stephen Hemminger <[email protected]>
---
app/test/test_mempool_perf.c | 32 ++++++++++++--------------------
1 file changed, 12 insertions(+), 20 deletions(-)
diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index dd2f0bbaca..6801812a8d 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -61,26 +61,21 @@
*
* - Pseudorandom max bulk size (*n_max_bulk*)
*
- * - Max bulk from CACHE_LINE_BURST to 256, and
RTE_MEMPOOL_CACHE_MAX_SIZE,
- * where CACHE_LINE_BURST is the number of pointers fitting into one
CPU cache line.
+ * - Max bulk: CACHE_LINE_BURST, 32, RTE_MEMPOOL_CACHE_MAX_SIZE,
+ * where CACHE_LINE_BURST is the number of pointers fitting into
+ * one CPU cache line.
*
* - Fixed bulk size (*n_get_bulk*, *n_put_bulk*)
*
- * - Bulk get from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
- * - Bulk put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
- * - Bulk get and put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE,
compile time constant
+ * - Bulk get: 1, 4, CACHE_LINE_BURST, 32, RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk put: 1, 4, CACHE_LINE_BURST, 32, RTE_MEMPOOL_CACHE_MAX_SIZE
*
* - Number of kept objects (*n_keep*)
*
- * - 32
- * - 128
- * - 512
- * - 2048
- * - 8192
- * - 32768
+ * - 32, 512, 32768
*/
-#define TIME_S 1
+#define TIME_MS 200
#define MEMPOOL_ELT_SIZE 2048
#define MAX_KEEP 32768
#define N (128 * MAX_KEEP)
@@ -257,7 +252,7 @@ per_lcore_mempool_test(void *arg)
start_cycles = rte_get_timer_cycles();
- while (time_diff/hz < TIME_S) {
+ while (time_diff < hz * TIME_MS / 1000) {
if (n_max_bulk != 0)
ret = test_loop_random(mp, cache, n_keep, n_max_bulk);
else if (!use_constant_values)
@@ -376,13 +371,10 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
static int
do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int
external_cache)
{
- unsigned int bulk_tab_max[] = { CACHE_LINE_BURST, 32, 64, 128, 256,
- RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
- unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
256,
- RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
- unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
256,
- RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
- unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
+ unsigned int bulk_tab_max[] = { CACHE_LINE_BURST, 32,
RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32,
RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32,
RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int keep_tab[] = { 32, 512, 32768, 0 };
unsigned int *max_bulk_ptr;
unsigned int *get_bulk_ptr;
unsigned int *put_bulk_ptr;