znver2 and 32bit

amonakov at gcc dot gnu.org Sat, 30 May 2020 09:44:05 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95435


Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #5 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Ugh. Stringop tuning for Ryzens is terribly anachronistic, all AMD processors
since K8 (!!) use the exact same tables, and 32-bit memset/memcpy don't use
libcall for large sizes:

static stringop_algs znver2_memcpy[2] = {
  {libcall, {{6, loop, false}, {14, unrolled_loop, false},
             {-1, rep_prefix_4_byte, false}}},
  {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false},
             {-1, libcall, false}}}};

(first subarray is 32-bit tuning, the second is for 64-bit)

Using test_stringop microbenchmark from PR43052 it's easy to see that library
memset/memcpy are fastest on sizes 256 and above. Below that, the result from
the microbenchmark may be debatable, I think we should prefer the libcall
almost always except for tiniest sizes for I-cache locality reasons.

But anyway, current tuning is completely inappropriate.

[Bug target/95435] bad builtin memcpy performance with znver1/znver2 and 32bit

Reply via email to