https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56766

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
This can be now seen in gfortran.dg/vect/fast-math-pr37021.f90 as well which
produces

.L14:
        movupd  (%r11), %xmm3
        addl    $1, %ecx
        addq    %rax, %r11
        movupd  (%r8), %xmm0
        addq    %rax, %r8
        unpckhpd        %xmm3, %xmm3
        movupd  (%rdi), %xmm2
        unpcklpd        %xmm0, %xmm0
        addq    %rsi, %rdi
        movupd  (%rbx), %xmm1
        mulpd   %xmm3, %xmm2
        addq    %rsi, %rbx
        cmpl    %ecx, %ebp
        palignr $8, %xmm1, %xmm1
        mulpd   %xmm1, %xmm0
        movapd  %xmm2, %xmm1
        addpd   %xmm0, %xmm1
        subpd   %xmm2, %xmm0
        shufpd  $2, %xmm0, %xmm1
        addpd   %xmm1, %xmm4
        jne     .L14

note the

        addpd   %xmm0, %xmm1
        subpd   %xmm2, %xmm0
        shufpd  $2, %xmm0, %xmm1

which should be

        addsubpd %xmm2, %xmm1

it happens to work for v4sf mode.

I think the vec_merge RTX code should either go away or we should canonicalize
the other variants to vec_merge properly.

For a target specific fix a 2nd addsubv2df3 pattern catching the
(vec_select:V2DF (vec_merge:V4DF ...)) case could be added.

Reply via email to