https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95435
Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amonakov at gcc dot gnu.org --- Comment #5 from Alexander Monakov <amonakov at gcc dot gnu.org> --- Ugh. Stringop tuning for Ryzens is terribly anachronistic, all AMD processors since K8 (!!) use the exact same tables, and 32-bit memset/memcpy don't use libcall for large sizes: static stringop_algs znver2_memcpy[2] = { {libcall, {{6, loop, false}, {14, unrolled_loop, false}, {-1, rep_prefix_4_byte, false}}}, {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false}, {-1, libcall, false}}}}; (first subarray is 32-bit tuning, the second is for 64-bit) Using test_stringop microbenchmark from PR43052 it's easy to see that library memset/memcpy are fastest on sizes 256 and above. Below that, the result from the microbenchmark may be debatable, I think we should prefer the libcall almost always except for tiniest sizes for I-cache locality reasons. But anyway, current tuning is completely inappropriate.