https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119596
ak at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ak at gcc dot gnu.org --- Comment #9 from ak at gcc dot gnu.org --- The problem here seems to be that REP MOVSQ is generated. The Intel CPUs have optimizations for short strings (enumerated by the "fast short strings CPUID"), but they only work with MOVSB, not MOVSQ. Most likely your regression would go away with MOVSB. gcc has this: /* X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB: Enable use of REP MOVSB/STOSB to move/set sequences of bytes with known size. */ DEF_TUNE (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB, "prefer_known_rep_movsb_stosb", m_SKYLAKE | m_CORE_HYBRID | m_CORE_ATOM | m_TREMONT | m_CORE_AVX512 | m_ZHAOXIN) Likely this needs to be enabled for SPR too.