https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119704

            Bug ID: 119704
           Summary: x86: partially disobeyed strategy rep-based request
                    for inlined memset
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mjguzik at gmail dot com
  Target Milestone: ---

13.3.0 runs into it, but I also tested on godbolt which claims to have 15.0.1:

gcc
(Compiler-Explorer-Build-gcc-ca4e6e6317ae0ceada8c46ef5db5ece165a6d1c4-binutils-2.42)
15.0.1 20250409 (experimental)

... and got the same result.

I have not verified memcpy, I suspect it might suffer the same problem.

src:
void zero(char *buf)
{
        __builtin_memset(buf, 0, SIZE);
}

compiled like so:

cc -O2 -DSIZE=48 -mno-sse
-mmemset-strategy=rep_byte:256:noalign,libcall:-1:noalign -c zero.c

Given rep_byte I expect rep movsb to be emitted. It does happen for some sizes,
but I'm also seeing regular stores or rep movsl.

For sizes 40 bytes and below this still emits regular stores, *not* the
rep-prefixed op, for example:
0000000000000000 <zero>:
   0:   f3 0f 1e fa             endbr64
   4:   48 c7 07 00 00 00 00    movq   $0x0,(%rdi)
   b:   48 c7 47 08 00 00 00    movq   $0x0,0x8(%rdi)
  12:   00
  13:   48 c7 47 10 00 00 00    movq   $0x0,0x10(%rdi)
  1a:   00
  1b:   48 c7 47 18 00 00 00    movq   $0x0,0x18(%rdi)
  22:   00
  23:   48 c7 47 20 00 00 00    movq   $0x0,0x20(%rdi)
  2a:   00
  2b:   c3                      ret

48 bytes is movsl:
0000000000000000 <zero>:
   0:   f3 0f 1e fa             endbr64
   4:   b9 0c 00 00 00          mov    $0xc,%ecx
   9:   31 c0                   xor    %eax,%eax
   b:   f3 ab                   rep stos %eax,%es:(%rdi)
   d:   c3                      ret

64 bytes is movsl:
0000000000000000 <zero>:
   0:   f3 0f 1e fa             endbr64
   4:   b9 10 00 00 00          mov    $0x10,%ecx
   9:   31 c0                   xor    %eax,%eax
   b:   f3 ab                   rep stos %eax,%es:(%rdi)
   d:   c3                      ret

65 bytes is movsb:
0000000000000000 <zero>:
   0:   f3 0f 1e fa             endbr64
   4:   b9 41 00 00 00          mov    $0x41,%ecx
   9:   31 c0                   xor    %eax,%eax
   b:   f3 aa                   rep stos %al,%es:(%rdi)
   d:   c3                      ret

Given the rep_byte strategy I expect the entire thing to movsb.

Reply via email to