https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119596

ak at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ak at gcc dot gnu.org

--- Comment #9 from ak at gcc dot gnu.org ---
The problem here seems to be that REP MOVSQ is generated.

The Intel CPUs have optimizations for short strings (enumerated by the "fast
short strings CPUID"), but they only work with MOVSB, not MOVSQ.

Most likely your regression would go away with MOVSB.

gcc has this:

/* X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB: Enable use of REP MOVSB/STOSB to
   move/set sequences of bytes with known size.  */
DEF_TUNE (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB,
          "prefer_known_rep_movsb_stosb",
          m_SKYLAKE | m_CORE_HYBRID | m_CORE_ATOM | m_TREMONT | m_CORE_AVX512
          | m_ZHAOXIN)


Likely this needs to be enabled for SPR too.

Reply via email to