https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89922

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-04-03
                 CC|                            |rguenth at gcc dot gnu.org
            Summary|Loop on fixed size array is |Loop on fixed size array is
                   |not unrolled and poorly     |not unrolled and poorly
                   |optimized                   |optimized at -O2
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
With -O3 I even see this vectorized to

_Z4testi:
.LFB0:
        .cfi_startproc
        movabsq $12884901890, %rdx
        movl    $1, (%rdi)
        movq    %rdi, %rax
        movq    %rdx, 8(%rdi)
        movl    %esi, 4(%rdi)
        movdqu  (%rdi), %xmm0
        paddd   .LC0(%rip), %xmm0
        movl    $8, 16(%rdi)
        movups  %xmm0, (%rdi)
        ret

but no jumps so you must use plain -O2?  Here unrolling is only done
if we estimate the code to not grow but the estimate is

  Loop size: 7
  Estimated size after unrolling: 9
Not unrolling loop 1: size would grow.

so you have to specify -funroll-loops where we get the desired

_Z4testi:
.LFB0:
        .cfi_startproc
        addl    $1, %esi
        movl    $1, (%rdi)
        movq    %rdi, %rax
        movabsq $25769803780, %rdx
        movl    %esi, 4(%rdi)
        movq    %rdx, 8(%rdi)
        movl    $8, 16(%rdi)
        ret

then.  For the unrolling heuristic it's hard to see the (partly)
constant initializer.

Reply via email to