On Thu, Jun 24, 2021 at 1:07 PM Richard Biener <[email protected]> wrote:
> This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
> instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
> thus subtract, add alternating on lanes, starting with subtract.
>
> It adds a corresponding optab and direct internal function,
> vec_addsub$a3 and renames the existing i386 backend patterns to
> the new canonical name.
>
> The SLP pattern matches the exact alternating lane sequence rather
> than trying to be clever and anticipating incoming permutes - we
> could permute the two input vectors to the needed lane alternation,
> do the addsub and then permute the result vector back but that's
> only profitable in case the two input or the output permute will
> vanish - something Tamars refactoring of SLP pattern recog should
> make possible.
Using the attached patch, I was also able to generate addsub for the
following testcase:
float x[2], y[2], z[2];
void foo ()
{
x[0] = y[0] - z[0];
x[1] = y[1] + z[1];
}
vmovq y(%rip), %xmm0
vmovq z(%rip), %xmm1
vaddsubps %xmm1, %xmm0, %xmm0
vmovlps %xmm0, x(%rip)
ret
Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index e887f03474d..5f10572718d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -788,6 +788,24 @@ (define_insn "*mmx_haddsubv2sf3"
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2SF")])
+(define_insn "vec_addsubv2sf3"
+ [(set (match_operand:V2SF 0 "register_operand" "=x,x")
+ (vec_merge:V2SF
+ (minus:V2SF
+ (match_operand:V2SF 1 "register_operand" "0,x")
+ (match_operand:V2SF 2 "register_operand" "x,x"))
+ (plus:V2SF (match_dup 1) (match_dup 2))
+ (const_int 1)))]
+ "TARGET_SSE3 && TARGET_MMX_WITH_SSE"
+ "@
+ addsubps\t{%2, %0|%0, %2}
+ vaddsubps\t{%2, %1, %0|%0, %1, %2}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sseadd")
+ (set_attr "prefix" "orig,vex")
+ (set_attr "prefix_rep" "1,*")
+ (set_attr "mode" "V4SF")])
+
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; Parallel single-precision floating point comparisons