On Fri, Sep 01, 2017 at 02:22:17PM +0100, Tamar Christina wrote: > Hi All, > > This patch adds the instructions for Dot Product to AArch64 along > with the intrinsics and vectorizer pattern. > > Armv8.2-a dot product supports 8-bit element values both > signed and unsigned. > > Dot product is available from Arm8.2-a and onwards. > > Regtested and bootstrapped on aarch64-none-elf and no issues. > > Ok for trunk? > > gcc/ > 2017-09-01 Tamar Christina <tamar.christ...@arm.com> > > * config/aarch64/aarch64-builtins.c > (aarch64_types_quadopu_lane_qualifiers): New. > (TYPES_QUADOPU_LANE): New. > * config/aarch64/aarch64-simd.md (aarch64_<sur>dot<dot_mode>): New. > (<sur>dot_prod<dot_mode>, aarch64_<sur>dot_lane<dot_mode>): New. > (aarch64_<sur>dot_laneq<dot_mode>): New. > * config/aarch64/aarch64-simd-builtins.def (sdot, udot): New. > (sdot_lane, udot_lane, sdot_laneq, udot_laneq): New. > * config/aarch64/iterators.md (UNSPEC_SDOT, UNSPEC_UDOT): New. > (DOT_MODE, dot_mode, Vdottype, DOTPROD): New. > (sur): Add SDOT and UDOT. > > --
> diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > f3e084f8778d70c82823b92fa80ff96021ad26db..21d46c84ab317c2d62afdf8c48117886aaf483b0 > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -386,6 +386,87 @@ > } > ) > > +;; These instructions map to the __builtins for the Dot Product operations. > +(define_insn "aarch64_<sur>dot<dot_mode>" > + [(set (match_operand:VS 0 "register_operand" "=w") > + (unspec:VS [(match_operand:VS 1 "register_operand" "0") > + (match_operand:<DOT_MODE> 2 "register_operand" "w") > + (match_operand:<DOT_MODE> 3 "register_operand" "w")] > + DOTPROD))] > + "TARGET_DOTPROD" > + "<sur>dot\\t%0.<Vtype>, %2.<Vdottype>, %3.<Vdottype>" > + [(set_attr "type" "neon_dot")] Would there be a small benefit in modelling this as: [(set (match_operand:VS 0 "register_operand" "=w") (add:VS ((match_operand:VS 1 "register_operand" "0") (unsepc:VS [(match_operand:<DOT_MODE> 2 "register_operand" "w") (match_operand:<DOT_MODE> 3 "register_operand" "w")] DOTPROD)))] > +) > + > +;; These expands map to the Dot Product optab the vectorizer checks for. > +;; The auto-vectorizer expects a dot product builtin that also does an > +;; accumulation into the provided register. > +;; Given the following pattern > +;; > +;; for (i=0; i<len; i++) { > +;; c = a[i] * b[i]; > +;; r += c; > +;; } > +;; return result; > +;; > +;; This can be auto-vectorized to > +;; r = a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3]; > +;; > +;; given enough iterations. However the vectorizer can keep unrolling the > loop > +;; r += a[4]*b[4] + a[5]*b[5] + a[6]*b[6] + a[7]*b[7]; > +;; r += a[8]*b[8] + a[9]*b[9] + a[10]*b[10] + a[11]*b[11]; > +;; ... > +;; > +;; and so the vectorizer provides r, in which the result has to be > accumulated. > +(define_expand "<sur>dot_prod<dot_mode>" > + [(set (match_operand:VS 0 "register_operand") > + (unspec:VS [(match_operand:<DOT_MODE> 1 "register_operand") > + (match_operand:<DOT_MODE> 2 "register_operand") > + (match_operand:VS 3 "register_operand")] > + DOTPROD))] This is just an expand that always ends in a DONE, so doesn't need the full description here, just: [(match_operand:VS 0 "register_operand) (match_operand:<DOT_MODE> 1 "register_operand") (match_operand:<DOT_MODE> 2 "register_operand") (match_operand:VS 3 "register_operand")] > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index > cceb57525c7aa44933419bd317b1f03a7b76f4c4..533c12cca916669195e9b094527ee0de31542b12 > 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -354,6 +354,8 @@ > UNSPEC_SQRDMLSH ; Used in aarch64-simd.md. > UNSPEC_FMAXNM ; Used in aarch64-simd.md. > UNSPEC_FMINNM ; Used in aarch64-simd.md. > + UNSPEC_SDOT ; Used in aarch64-simd.md. > + UNSPEC_UDOT ; Used in aarch64-simd.md. > ]) > > ;; ------------------------------------------------------------------ > @@ -810,6 +812,13 @@ > (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")]) > (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")]) > > +;; Mapping attribute for Dot Product input modes based on result mode. > +(define_mode_attr DOT_MODE [(V2SI "V8QI") (V4SI "V16QI")]) > +(define_mode_attr dot_mode [(V2SI "v8qi") (V4SI "v16qi")]) Are these not identical to the two lines above in the context? > (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")]) > (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")]) Thanks, James