http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671
Richard Guenther <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Known to work| |4.7.0 Resolution| |FIXED Target Milestone|4.4.7 |4.7.0 Known to fail| | --- Comment #9 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-16 12:54:21 UTC --- We're back to exactly the code from 4.2 on trunk: .L2: movaps in2(%rax), %xmm0 mulps %xmm1, %xmm0 addps %xmm2, %xmm0 mulps in1(%rax), %xmm0 movaps %xmm0, out(%rax) addq $16, %rax cmpq %rdx, %rax jne .L2 but I can't reproduce the originally reported assembly with 4.4.0 either (the report lacks information on flags used besides -march=core2, so I used both -O2 and -O3 with the same result). I can confirm that for the testcase in comment #7 we, since 4.3.x and up to 4.6.x generate sth like f: .LFB0: .cfi_startproc sarl $2, %esi xorl %eax, %eax subl $1, %esi addq $1, %rsi salq $4, %rsi .p2align 4,,10 .p2align 3 .L2: movl (%rdi,%rax), %ecx movl %ecx, (%rdx,%rax) addq $16, %rax cmpq %rsi, %rax jne .L2 instead of what we generated with 4.2: f: .LFB2: sarl $2, %esi .p2align 4,,7 .L2: movl (%rdi), %eax addq $16, %rdi movl %eax, (%rdx) addq $16, %rdx subl $1, %esi jne .L2 rep ; ret But that, even while not using decrement-and-branch looks superior to me. For trunk we now create f: .LFB0: .cfi_startproc sarl $2, %esi .p2align 4,,10 .p2align 3 .L2: movl (%rdi), %eax addq $16, %rdi movl %eax, (%rdx) addq $16, %rdx subl $1, %esi jne .L2 again. So, closing as fixed for trunk (or WORKSFORME for the original report).