http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Known to work| |4.7.0
Resolution| |FIXED
Target Milestone|4.4.7 |4.7.0
Known to fail| |
--- Comment #9 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-16
12:54:21 UTC ---
We're back to exactly the code from 4.2 on trunk:
.L2:
movaps in2(%rax), %xmm0
mulps %xmm1, %xmm0
addps %xmm2, %xmm0
mulps in1(%rax), %xmm0
movaps %xmm0, out(%rax)
addq $16, %rax
cmpq %rdx, %rax
jne .L2
but I can't reproduce the originally reported assembly with 4.4.0 either
(the report lacks information on flags used besides -march=core2, so I used
both -O2 and -O3 with the same result).
I can confirm that for the testcase in comment #7 we, since 4.3.x and
up to 4.6.x generate sth like
f:
.LFB0:
.cfi_startproc
sarl $2, %esi
xorl %eax, %eax
subl $1, %esi
addq $1, %rsi
salq $4, %rsi
.p2align 4,,10
.p2align 3
.L2:
movl (%rdi,%rax), %ecx
movl %ecx, (%rdx,%rax)
addq $16, %rax
cmpq %rsi, %rax
jne .L2
instead of what we generated with 4.2:
f:
.LFB2:
sarl $2, %esi
.p2align 4,,7
.L2:
movl (%rdi), %eax
addq $16, %rdi
movl %eax, (%rdx)
addq $16, %rdx
subl $1, %esi
jne .L2
rep ; ret
But that, even while not using decrement-and-branch looks superior to me.
For trunk we now create
f:
.LFB0:
.cfi_startproc
sarl $2, %esi
.p2align 4,,10
.p2align 3
.L2:
movl (%rdi), %eax
addq $16, %rdi
movl %eax, (%rdx)
addq $16, %rdx
subl $1, %esi
jne .L2
again.
So, closing as fixed for trunk (or WORKSFORME for the original report).