https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2019-05-23 00:00:00         |2025-2-12
                 CC|                            |konstantinos.eleftheriou@vr
                   |                            |ull.eu, law at gcc dot gnu.org

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
Assembly with -O3 -march=skylake is still

loop:
.LFB0:
        .cfi_startproc
        movslq  %edi, %rdi
        vbroadcastsd    %xmm0, %ymm1
        vmovddup        %xmm0, %xmm0
        vmulpd  a(,%rdi,8), %ymm1, %ymm1
        vxorpd  %xmm4, %xmm4, %xmm4
        vmovupd %ymm1, r(%rip)      <--- Offsetted full store
        vmulpd  a+32(,%rdi,8), %xmm0, %xmm0
        vmovupd %xmm0, r+32(%rip)   <--- Store upper half
        vmovupd r+16(%rip), %ymm2   <--- STLF fail
        vextractf128    $0x1, %ymm2, %xmm3
        vunpckhpd       %xmm3, %xmm3, %xmm0
        vaddsd  %xmm4, %xmm0, %xmm0
        vunpckhpd       %xmm2, %xmm2, %xmm4
        vaddsd  %xmm3, %xmm0, %xmm0
        vunpckhpd       %xmm1, %xmm1, %xmm3
        vaddsd  %xmm4, %xmm0, %xmm0
        vaddsd  %xmm2, %xmm0, %xmm0
        vaddsd  %xmm3, %xmm0, %xmm0
        vaddsd  %xmm0, %xmm1, %xmm0
        vzeroupper
        ret

when you enable -favoid-store-forwarding this is split as

        vmulpd  a(,%rdi,8), %ymm1, %ymm1
        vmovupd %ymm1, r(%rip)
        vmulpd  a+32(,%rdi,8), %xmm0, %xmm0
        vmovupd r+16(%rip), %ymm5
        vmovapd %ymm5, -32(%rsp)
        vmovapd %xmm0, -16(%rsp)
        vmovapd -32(%rsp), %ymm6
        vmovupd %xmm0, r+32(%rip)

but that is even worse now, the offset load is still there and the
two stack moves don't forward.  So we replaced one with two STLF fails.

Ugh.

Reply via email to