https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120708

            Bug ID: 120708
           Summary: ix86_expand_set_or_cpymem ignores MOVE_MAX
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hjl.tools at gmail dot com
                CC: liuhongt at gcc dot gnu.org
  Target Milestone: ---
            Target: x86-64

i386 defines

/* Max number of bytes we can move from memory to memory in one
   reasonably fast instruction, as opposed to MOVE_MAX_PIECES which
   is the number of bytes at a time which we can move efficiently.
   MOVE_MAX_PIECES defaults to MOVE_MAX.  */

#define MOVE_MAX \
  ((TARGET_AVX512F \
    && (ix86_move_max == PVW_AVX512 \
        || ix86_store_max == PVW_AVX512)) \
   ? 64 \
   : ((TARGET_AVX \
       && (ix86_move_max >= PVW_AVX256 \
           || ix86_store_max >= PVW_AVX256)) \
      ? 32 \
      : ((TARGET_SSE2 \
          && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
          && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
         ? 16 : UNITS_PER_WORD)))

If TARGET_SSE_UNALIGNED_LOAD_OPTIMAL or TARGET_SSE_UNALIGNED_STORE_OPTIMAL are
false, MOVE_MAX is defined UNITS_PER_WORD.  For -march=atom, both are false.
But ix86_expand_set_or_cpymem ignores it.  As the result,
memcpy-vector_loop-1.c
and memset-vector_loop-2.c, which are compiled with -march=atom, are compiled
with
SSE instructions:

        movdqa  %xmm3, a(%rax)
        movdqa  %xmm2, a+16(%rax)
        movdqa  %xmm1, a+32(%rax)
        movdqa  %xmm0, a+48(%rax)

Reply via email to