inefficient memset

rguenth at gcc dot gnu.org Wed, 06 Jun 2012 03:06:27 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629


Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-06-06
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-06-06 
10:05:57 UTC ---
Confirmed with -Os on trunk (4.8).  With -O2 we unroll completely to

f:
.LFB0:
        .cfi_startproc
        movq    $0, (%rdi)
        movq    $0, 8(%rdi)
        movq    $0, 16(%rdi)
        movq    $0, 24(%rdi)
        movq    $0, 32(%rdi)
        movq    $0, 40(%rdi)
        movq    $0, 48(%rdi)
        movq    $0, 56(%rdi)
        movq    $0, 64(%rdi)
        movq    $0, 72(%rdi)
        ret

which lacks the size optimization to use a zeroed %rax.  Likewise
for -Os which now looks like

   0:   48 8d 57 30             lea    0x30(%rdi),%rdx
   4:   48 c7 07 00 00 00 00    movq   $0x0,(%rdi)
   b:   48 c7 47 08 00 00 00    movq   $0x0,0x8(%rdi)
  12:   00 
  13:   48 c7 47 10 00 00 00    movq   $0x0,0x10(%rdi)
  1a:   00 
  1b:   48 c7 47 18 00 00 00    movq   $0x0,0x18(%rdi)
  22:   00 
  23:   b9 08 00 00 00          mov    $0x8,%ecx
  28:   48 c7 47 20 00 00 00    movq   $0x0,0x20(%rdi)
  2f:   00 
  30:   48 c7 47 28 00 00 00    movq   $0x0,0x28(%rdi)
  37:   00 
  38:   31 c0                   xor    %eax,%eax
  3a:   48 89 d7                mov    %rdx,%rdi
  3d:   f3 ab                   rep stos %eax,%es:(%rdi)
  3f:   c3                      retq   

I suppose with -Os we use rep stosl because that's one byte smaller ...(?)

I suppose doing the $0x0 optimization should be done post-reload.

[Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset

Reply via email to