http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60482
Bug ID: 60482 Summary: Loop optimization regression Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: yvan.roux at linaro dot org Created attachment 32323 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32323&action=edit trunk.s Hi, I didn't had time to investigate further, but I want to raise quickly that the code bellow was optimized at r204283 by taking into account the trip count information of the loop and is not with the trunk (I spotted the issue on AArch64 and x86_64). code: typedef double adouble __attribute__ ((__aligned__(16))); double p1(adouble *x, int n) { double p1_ = 0.0; (!(n % 128) == 0) ? __builtin_unreachable() : 1 ; for (int i=0; i<n; i++) p1_ += x[i] ; return p1_ ; } compiled with flags : -Ofast -std=c99 x86_64 generated assembly at r204283: p1: .LFB0: .cfi_startproc testl %esi, %esi jle .L5 pxor %xmm1, %xmm1 shrl %esi xorl %eax, %eax .L4: movq %rax, %rdx addq $1, %rax salq $4, %rdx cmpl %eax, %esi addpd (%rdi,%rdx), %xmm1 ja .L4 movapd %xmm1, %xmm0 unpckhpd %xmm1, %xmm1 addsd %xmm1, %xmm0 ret .p2align 4,,10 .p2align 3 .L5: pxor %xmm0, %xmm0 ret .cfi_endproc X86_64 trunk generated assembly is attached. Thanks, Yvan