On Sat, 2 Mar 2024 13:01:51 +0100
Mattias Rönnblom <[email protected]> wrote:
> I ran some DSW benchmarks, and if you add
>
> diff --git a/lib/eal/x86/include/rte_memcpy.h
> b/lib/eal/x86/include/rte_memcpy.h
> index 72a92290e0..64cd82d78d 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -862,6 +862,11 @@ rte_memcpy_aligned(void *dst, const void *src,
> size_t n)
> static __rte_always_inline void *
> rte_memcpy(void *dst, const void *src, size_t n)
> {
> + if (__builtin_constant_p(n) && n <= 32) {
> + memcpy(dst, src, n);
> + return dst;
> + }
> +
The default GCC inline threshold is 64 bytes (ie cache line size)
and that makes sense. Since using __builtin_constant_p could
do:
if (__builtin_constant_p(p) && n < RTE_CACHE_LINE_SIZE)
return __builtin_memcpy(dst, src, n);