Alejandro Martinez Vicente <alejandro.martinezvice...@arm.com> writes:
> +;; Helper expander for aarch64_<su>abd<mode>_3 to save the callers
> +;; the hassle of constructing the other arm of the MINUS.
> +(define_expand "<su>abd<mode>_3"
> +  [(use (match_operand:SVE_I 0 "register_operand"))
> +   (USMAX:SVE_I (match_operand:SVE_I 1 "register_operand")
> +             (match_operand:SVE_I 2 "register_operand"))]
> +  "TARGET_SVE"
> +  {
> +    rtx other_arm
> +      = simplify_gen_binary (<MAX_OPP>, <MODE>mode, operands[1], 
> operands[2]);

I realise this is just copied from the Advanced SIMD version, but
simplify_gen_binary is a bit dangerous here, since we explicitly want an
unsimplified <MAX_OPP> with the two operands given.  Probably better as:

  gen_rtx_<MAX_OPP> (<MODE>mode, ...)

> +    emit_insn (gen_aarch64_<su>abd<mode>_3 (operands[0], operands[1],
> +            operands[2], other_arm));
> +    DONE;
> +  }
> +)
> +
> +;; Unpredicated integer absolute difference.
> +(define_expand "aarch64_<su>abd<mode>_3"
> +  [(set (match_operand:SVE_I 0 "register_operand")
> +     (unspec:SVE_I
> +       [(match_dup 4)
> +        (minus:SVE_I
> +          (USMAX:SVE_I
> +            (match_operand:SVE_I 1 "register_operand" "w")
> +            (match_operand:SVE_I 2 "register_operand" "w"))
> +          (match_operator 3 "aarch64_<max_opp>"
> +            [(match_dup 1)
> +             (match_dup 2)]))]
> +       UNSPEC_MERGE_PTRUE))]
> +  "TARGET_SVE"
> +  {
> +    operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
> +  }
> +)

I think we should go directly from <su>abd<mode>_3 to the final
pattern, so that <su>abd<mode>_3 does the force_reg too.  This
would make...

> +;; Predicated integer absolute difference.
> +(define_insn "*aarch64_<su>abd<mode>_3"

...this the named pattern, instead of starting with "*".

> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w")
> +     (unspec:SVE_I
> +       [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
> +        (minus:SVE_I
> +          (USMAX:SVE_I
> +            (match_operand:SVE_I 2 "register_operand" "w, w")

Should be "0, w", so that the first alternative ties the input
to the output.

> +            (match_operand:SVE_I 3 "register_operand" "w, w"))
> +          (match_operator 4 "aarch64_<max_opp>"
> +            [(match_dup 2)
> +             (match_dup 3)]))]
> +       UNSPEC_MERGE_PTRUE))]
> +  "TARGET_SVE"
> +  "@
> +   <su>abd\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
> +   movprfx\t%0, %2\;<su>abd\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
> +  [(set_attr "movprfx" "*,yes")]
> +)
> +
> +;; Emit a sequence to produce a sum-of-absolute-differences of the inputs in
> +;; operands 1 and 2.  The sequence also has to perform a widening reduction 
> of
> +;; the difference into a vector and accumulate that into operand 3 before
> +;; copying that into the result operand 0.
> +;; Perform that with a sequence of:
> +;; MOV               ones.b, #1
> +;; UABD              diff.b, p0/m, op1.b, op2.b
> +;; UDOT              op3.s, diff.b, ones.b
> +;; MOV               op0, op3  // should be eliminated in later passes.
> +;; The signed version just uses the signed variants of the above 
> instructions.

Think it would be clearer if we removed the last line and just used
[SU]ABD instead of UABD, since that's the only sign-dependent part
of the operation.  Also think we should SVEize it with MOVPRFX,
since a separate MOV should never be needed:

;; MOV          ones.b, #1
;; [SU]ABD      diff.b, ptrue/m, op1.b, op2.b
;; MOVPRFX      op0, op3                // If necessary
;; UDOT         op0.s, diff.b, ones.b

> +(define_expand "<sur>sad<vsi2qi>"
> +  [(use (match_operand:SVE_SDI 0 "register_operand"))
> +   (unspec:<VSI2QI> [(use (match_operand:<VSI2QI> 1 "register_operand"))
> +                 (use (match_operand:<VSI2QI> 2 "register_operand"))] ABAL)
> +   (use (match_operand:SVE_SDI 3 "register_operand"))]
> +  "TARGET_SVE"
> +  {
> +    rtx ones = force_reg (<VSI2QI>mode, CONST1_RTX (<VSI2QI>mode));
> +    rtx diff = gen_reg_rtx (<VSI2QI>mode);
> +    emit_insn (gen_<sur>abd<vsi2qi>_3 (diff, operands[1], operands[2]));
> +    emit_insn (gen_udot_prod<vsi2qi> (operands[3], diff, ones, operands[3]));
> +    emit_move_insn (operands[0], operands[3]);

It would be better to make operands[0] the destination of the UDOT and
drop the move.

Thanks,
Richard

> +    DONE;
> +  }
> +)
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index b3b2d6e..20aa0e9 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -1060,6 +1060,9 @@
>  ;; Map smax to smin and umax to umin.
>  (define_code_attr max_opp [(smax "smin") (umax "umin")])
>  
> +;; Same as above, but louder.
> +(define_code_attr MAX_OPP [(smax "SMIN") (umax "UMIN")])

:-)

Reply via email to