https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89922
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2019-04-03 CC| |rguenth at gcc dot gnu.org Summary|Loop on fixed size array is |Loop on fixed size array is |not unrolled and poorly |not unrolled and poorly |optimized |optimized at -O2 Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- With -O3 I even see this vectorized to _Z4testi: .LFB0: .cfi_startproc movabsq $12884901890, %rdx movl $1, (%rdi) movq %rdi, %rax movq %rdx, 8(%rdi) movl %esi, 4(%rdi) movdqu (%rdi), %xmm0 paddd .LC0(%rip), %xmm0 movl $8, 16(%rdi) movups %xmm0, (%rdi) ret but no jumps so you must use plain -O2? Here unrolling is only done if we estimate the code to not grow but the estimate is Loop size: 7 Estimated size after unrolling: 9 Not unrolling loop 1: size would grow. so you have to specify -funroll-loops where we get the desired _Z4testi: .LFB0: .cfi_startproc addl $1, %esi movl $1, (%rdi) movq %rdi, %rax movabsq $25769803780, %rdx movl %esi, 4(%rdi) movq %rdx, 8(%rdi) movl $8, 16(%rdi) ret then. For the unrolling heuristic it's hard to see the (partly) constant initializer.