https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89445

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
        vmovupd (%rsi,%rax), %zmm1{%k1}{z}
        addq    %rdx, %rax
        vmovupd (%rax), %zmm2{%k1}{z}
        vfmadd132pd     %zmm0, %zmm2, %zmm1
        vmovupd %zmm1, (%rax){%k1}
isn't optimal btw, it would be nice if we could merge that masking into the
vfmadd132pd instruction, like:
        vmovupd (%rsi,%rax), %zmm1{%k1}{z}
        addq    %rdx, %rax
        vfmadd132pd     (%rax), %zmm2, %zmm1%{k1}{z}
        vmovupd %zmm1, (%rax){%k1}
but not really sure how to achieve that.

Reply via email to