Alejandro Martinez Vicente <alejandro.martinezvice...@arm.com> writes: > +;; Helper expander for aarch64_<su>abd<mode>_3 to save the callers > +;; the hassle of constructing the other arm of the MINUS. > +(define_expand "<su>abd<mode>_3" > + [(use (match_operand:SVE_I 0 "register_operand")) > + (USMAX:SVE_I (match_operand:SVE_I 1 "register_operand") > + (match_operand:SVE_I 2 "register_operand"))] > + "TARGET_SVE" > + { > + rtx other_arm > + = simplify_gen_binary (<MAX_OPP>, <MODE>mode, operands[1], > operands[2]);
I realise this is just copied from the Advanced SIMD version, but simplify_gen_binary is a bit dangerous here, since we explicitly want an unsimplified <MAX_OPP> with the two operands given. Probably better as: gen_rtx_<MAX_OPP> (<MODE>mode, ...) > + emit_insn (gen_aarch64_<su>abd<mode>_3 (operands[0], operands[1], > + operands[2], other_arm)); > + DONE; > + } > +) > + > +;; Unpredicated integer absolute difference. > +(define_expand "aarch64_<su>abd<mode>_3" > + [(set (match_operand:SVE_I 0 "register_operand") > + (unspec:SVE_I > + [(match_dup 4) > + (minus:SVE_I > + (USMAX:SVE_I > + (match_operand:SVE_I 1 "register_operand" "w") > + (match_operand:SVE_I 2 "register_operand" "w")) > + (match_operator 3 "aarch64_<max_opp>" > + [(match_dup 1) > + (match_dup 2)]))] > + UNSPEC_MERGE_PTRUE))] > + "TARGET_SVE" > + { > + operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode)); > + } > +) I think we should go directly from <su>abd<mode>_3 to the final pattern, so that <su>abd<mode>_3 does the force_reg too. This would make... > +;; Predicated integer absolute difference. > +(define_insn "*aarch64_<su>abd<mode>_3" ...this the named pattern, instead of starting with "*". > + [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w") > + (unspec:SVE_I > + [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl") > + (minus:SVE_I > + (USMAX:SVE_I > + (match_operand:SVE_I 2 "register_operand" "w, w") Should be "0, w", so that the first alternative ties the input to the output. > + (match_operand:SVE_I 3 "register_operand" "w, w")) > + (match_operator 4 "aarch64_<max_opp>" > + [(match_dup 2) > + (match_dup 3)]))] > + UNSPEC_MERGE_PTRUE))] > + "TARGET_SVE" > + "@ > + <su>abd\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype> > + movprfx\t%0, %2\;<su>abd\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>" > + [(set_attr "movprfx" "*,yes")] > +) > + > +;; Emit a sequence to produce a sum-of-absolute-differences of the inputs in > +;; operands 1 and 2. The sequence also has to perform a widening reduction > of > +;; the difference into a vector and accumulate that into operand 3 before > +;; copying that into the result operand 0. > +;; Perform that with a sequence of: > +;; MOV ones.b, #1 > +;; UABD diff.b, p0/m, op1.b, op2.b > +;; UDOT op3.s, diff.b, ones.b > +;; MOV op0, op3 // should be eliminated in later passes. > +;; The signed version just uses the signed variants of the above > instructions. Think it would be clearer if we removed the last line and just used [SU]ABD instead of UABD, since that's the only sign-dependent part of the operation. Also think we should SVEize it with MOVPRFX, since a separate MOV should never be needed: ;; MOV ones.b, #1 ;; [SU]ABD diff.b, ptrue/m, op1.b, op2.b ;; MOVPRFX op0, op3 // If necessary ;; UDOT op0.s, diff.b, ones.b > +(define_expand "<sur>sad<vsi2qi>" > + [(use (match_operand:SVE_SDI 0 "register_operand")) > + (unspec:<VSI2QI> [(use (match_operand:<VSI2QI> 1 "register_operand")) > + (use (match_operand:<VSI2QI> 2 "register_operand"))] ABAL) > + (use (match_operand:SVE_SDI 3 "register_operand"))] > + "TARGET_SVE" > + { > + rtx ones = force_reg (<VSI2QI>mode, CONST1_RTX (<VSI2QI>mode)); > + rtx diff = gen_reg_rtx (<VSI2QI>mode); > + emit_insn (gen_<sur>abd<vsi2qi>_3 (diff, operands[1], operands[2])); > + emit_insn (gen_udot_prod<vsi2qi> (operands[3], diff, ones, operands[3])); > + emit_move_insn (operands[0], operands[3]); It would be better to make operands[0] the destination of the UDOT and drop the move. Thanks, Richard > + DONE; > + } > +) > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index b3b2d6e..20aa0e9 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -1060,6 +1060,9 @@ > ;; Map smax to smin and umax to umin. > (define_code_attr max_opp [(smax "smin") (umax "umin")]) > > +;; Same as above, but louder. > +(define_code_attr MAX_OPP [(smax "SMIN") (umax "UMIN")]) :-)