http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
      Known to work|                            |4.7.0
         Resolution|                            |FIXED
   Target Milestone|4.4.7                       |4.7.0
      Known to fail|                            |

--- Comment #9 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-16 
12:54:21 UTC ---
We're back to exactly the code from 4.2 on trunk:

.L2:
        movaps  in2(%rax), %xmm0
        mulps   %xmm1, %xmm0
        addps   %xmm2, %xmm0
        mulps   in1(%rax), %xmm0
        movaps  %xmm0, out(%rax)
        addq    $16, %rax
        cmpq    %rdx, %rax
        jne     .L2

but I can't reproduce the originally reported assembly with 4.4.0 either
(the report lacks information on flags used besides -march=core2, so I used
both -O2 and -O3 with the same result).

I can confirm that for the testcase in comment #7 we, since 4.3.x and
up to 4.6.x generate sth like

f:
.LFB0:
        .cfi_startproc
        sarl    $2, %esi
        xorl    %eax, %eax
        subl    $1, %esi
        addq    $1, %rsi
        salq    $4, %rsi
        .p2align 4,,10
        .p2align 3
.L2:
        movl    (%rdi,%rax), %ecx
        movl    %ecx, (%rdx,%rax)
        addq    $16, %rax
        cmpq    %rsi, %rax
        jne     .L2

instead of what we generated with 4.2:

f:
.LFB2:
        sarl    $2, %esi
        .p2align 4,,7
.L2:
        movl    (%rdi), %eax
        addq    $16, %rdi
        movl    %eax, (%rdx)
        addq    $16, %rdx
        subl    $1, %esi
        jne     .L2
        rep ; ret

But that, even while not using decrement-and-branch looks superior to me.
For trunk we now create

f:
.LFB0:
        .cfi_startproc
        sarl    $2, %esi
        .p2align 4,,10
        .p2align 3
.L2:
        movl    (%rdi), %eax
        addq    $16, %rdi
        movl    %eax, (%rdx)
        addq    $16, %rdx
        subl    $1, %esi
        jne     .L2

again.

So, closing as fixed for trunk (or WORKSFORME for the original report).

Reply via email to