https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89445
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> --- vmovupd (%rsi,%rax), %zmm1{%k1}{z} addq %rdx, %rax vmovupd (%rax), %zmm2{%k1}{z} vfmadd132pd %zmm0, %zmm2, %zmm1 vmovupd %zmm1, (%rax){%k1} isn't optimal btw, it would be nice if we could merge that masking into the vfmadd132pd instruction, like: vmovupd (%rsi,%rax), %zmm1{%k1}{z} addq %rdx, %rax vfmadd132pd (%rax), %zmm2, %zmm1%{k1}{z} vmovupd %zmm1, (%rax){%k1} but not really sure how to achieve that.