https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56766
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- This can be now seen in gfortran.dg/vect/fast-math-pr37021.f90 as well which produces .L14: movupd (%r11), %xmm3 addl $1, %ecx addq %rax, %r11 movupd (%r8), %xmm0 addq %rax, %r8 unpckhpd %xmm3, %xmm3 movupd (%rdi), %xmm2 unpcklpd %xmm0, %xmm0 addq %rsi, %rdi movupd (%rbx), %xmm1 mulpd %xmm3, %xmm2 addq %rsi, %rbx cmpl %ecx, %ebp palignr $8, %xmm1, %xmm1 mulpd %xmm1, %xmm0 movapd %xmm2, %xmm1 addpd %xmm0, %xmm1 subpd %xmm2, %xmm0 shufpd $2, %xmm0, %xmm1 addpd %xmm1, %xmm4 jne .L14 note the addpd %xmm0, %xmm1 subpd %xmm2, %xmm0 shufpd $2, %xmm0, %xmm1 which should be addsubpd %xmm2, %xmm1 it happens to work for v4sf mode. I think the vec_merge RTX code should either go away or we should canonicalize the other variants to vec_merge properly. For a target specific fix a 2nd addsubv2df3 pattern catching the (vec_select:V2DF (vec_merge:V4DF ...)) case could be added.