On Tue, Aug 19, 2025 at 6:29 AM Tamar Christina <tamar.christ...@arm.com> wrote: > > This adds support for detectioon of the ADDHN pattern in the vectorizer. > > Concretely try to detect > > _1 = (W)a > _2 = (W)b > _3 = _1 + _2 > _4 = _3 >> (precision(a) / 2) > _5 = (N)_4 > > where > W = precision (a) * 2 > N = precision (a) / 2
Hmm. Is the widening because of UB with signed overflow? The actual carry of a + b doesn't end up in (N)(_3 >> (precision(a) / 2)). I'd expect that for unsigned a and b you could see just (N)((a + b) >> (precision(a) / 2)), no? Integer promotion would make this difficult to write, of course, unless the patterns exist for SImode -> HImode add-high. Also ... > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master? Tests in the next patch which adds the optabs to AArch64. > > Thanks, > Tamar > > gcc/ChangeLog: > > * internal-fn.def (VEC_ADD_HALFING_NARROW, > IFN_VEC_ADD_HALFING_NARROW_LO, IFN_VEC_ADD_HALFING_NARROW_HI, > IFN_VEC_ADD_HALFING_NARROW_EVEN, IFN_VEC_ADD_HALFING_NARROW_ODD): New. > * internal-fn.cc (commutative_binary_fn_p): Add > IFN_VEC_ADD_HALFING_NARROW, IFN_VEC_ADD_HALFING_NARROW_LO and > IFN_VEC_ADD_HALFING_NARROW_EVEN. > (commutative_ternary_fn_p): Add IFN_VEC_ADD_HALFING_NARROW_HI, > IFN_VEC_ADD_HALFING_NARROW_ODD. > * match.pd (add_half_narrowing_p): New. > * optabs.def (vec_saddh_narrow_optab, vec_saddh_narrow_hi_optab, > vec_saddh_narrow_lo_optab, vec_saddh_narrow_odd_optab, > vec_saddh_narrow_even_optab, vec_uaddh_narrow_optab, > vec_uaddh_narrow_hi_optab, vec_uaddh_narrow_lo_optab, > vec_uaddh_narrow_odd_optab, vec_uaddh_narrow_even_optab): New. > * tree-vect-patterns.cc (gimple_add_half_narrowing_p): New. > (vect_recog_add_halfing_narrow_pattern): New. > (vect_vect_recog_func_ptrs): Use it. > * doc/generic.texi: Document them. > * doc/md.texi: Likewise. > > --- > diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi > index > d4ac580a7a8b9cd339d26cb97f7eb963f83746a4..b32d99d4d1aad244a493d8f67b66151ff5363d0e > 100644 > --- a/gcc/doc/generic.texi > +++ b/gcc/doc/generic.texi > @@ -1834,6 +1834,11 @@ a value from @code{enum annot_expr_kind}, the third is > an @code{INTEGER_CST}. > @tindex IFN_VEC_WIDEN_MINUS_LO > @tindex IFN_VEC_WIDEN_MINUS_EVEN > @tindex IFN_VEC_WIDEN_MINUS_ODD > +@tindex IFN_VEC_ADD_HALFING_NARROW > +@tindex IFN_VEC_ADD_HALFING_NARROW_HI > +@tindex IFN_VEC_ADD_HALFING_NARROW_LO > +@tindex IFN_VEC_ADD_HALFING_NARROW_EVEN > +@tindex IFN_VEC_ADD_HALFING_NARROW_ODD > @tindex VEC_UNPACK_HI_EXPR > @tindex VEC_UNPACK_LO_EXPR > @tindex VEC_UNPACK_FLOAT_HI_EXPR > @@ -1956,6 +1961,51 @@ vector of @code{N/2} subtractions. In the case of > vector are subtracted from the odd @code{N/2} of the first to produce the > vector of @code{N/2} subtractions. > > +@item IFN_VEC_ADD_HALFING_NARROW > +This internal function represents widening vector addition of two input > +vectors, extracting the top half of the result and narrow that value to a > type > +half that of the original input. > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}. Its > operands > +are vectors that contain the same number of elements (@code{N}) of the same > +integral type. The result is a vector that contains the same amount > (@code{N}) > +of elements, of an integral type whose size is twice as narrow, as the input > +vectors. If the current target does not implement the corresponding optabs > the > +vectorizer may choose to split it into either a pair > +of @code{IFN_VEC_ADD_HALFING_NARROW_HI} and > @code{IFN_VEC_ADD_HALFING_NARROW_LO} > +or @code{IFN_VEC_ADD_HALFING_NARROW_EVEN} and > +@code{IFN_VEC_ADD_HALFING_NARROW_ODD}, depending on what optabs the target > +implements. > + > +@item IFN_VEC_ADD_HALFING_NARROW_HI > +@itemx IFN_VEC_ADD_HALFING_NARROW_LO > +This internal function represents widening vector addition of two input > +vectors, extracting the top half of the result and narrow that value to a > type > +half that of the original input inserting the result as the high or low half > of > +the result vector. > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}. Their > +operands are vectors that contain the same number of elements (@code{N}) of > the > +same integral type. The result is a vector that contains half as many > elements, > +of an integral type whose size is twice as narrow. In the case of > +@code{IFN_VEC_ADD_HALFING_NARROW_HI} the high @code{N/2} elements of the > result > +is inserted into the given result vector with the low elements left > untouched. > +The operation is a RMW. In the case of @code{IFN_VEC_ADD_HALFING_NARROW_LO} > the > +low @code{N/2} elements of the result is used as the full result. > + > +@item IFN_VEC_ADD_HALFING_NARROW_EVEN > +@itemx IFN_VEC_ADD_HALFING_NARROW_ODD > +This internal function represents widening vector addition of two input > +vectors, extracting the top half of the result and narrow that value to a > type > +half that of the original input inserting the result as the even or odd > parts of > +the result vector. > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}. Their > +operands are vectors that contain the same number of elements (@code{N}) of > the > +same integral type. The result is a vector that contains half as many > elements, > +of an integral type whose size is twice as narrow. In the case of > +@code{IFN_VEC_ADD_HALFING_NARROW_ODD} the odd @code{N/2} elements of the > result > +is inserted into the given result vector with the even elements left > untouched. > +The operation is a RMW. In the case of > @code{IFN_VEC_ADD_HALFING_NARROW_EVEN} > +the even @code{N/2} elements of the result is used as the full result. > + > @item VEC_UNPACK_HI_EXPR > @itemx VEC_UNPACK_LO_EXPR > These nodes represent unpacking of the high and low parts of the input > vector, > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index > aba93f606eca59d31c103a05b2567fd4f3be55f3..cb691b56f137a0037f5178ba853911df5a65e5a7 > 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -6087,6 +6087,21 @@ vectors with N signed/unsigned elements of size S@. > Find the absolute > difference between operands 1 and 2 and widen the resulting elements. > Put the N/2 results of size 2*S in the output vector (operand 0). > > +@cindex @code{vec_saddh_narrow_hi_@var{m}} instruction pattern > +@cindex @code{vec_saddh_narrow_lo_@var{m}} instruction pattern > +@cindex @code{vec_uaddh_narrow_hi_@var{m}} instruction pattern > +@cindex @code{vec_uaddh_narrow_lo_@var{m}} instruction pattern > +@item @samp{vec_uaddh_narrow_hi_@var{m}}, @samp{vec_uaddh_narrow_lo_@var{m}} > +@itemx @samp{vec_saddh_narrow_hi_@var{m}}, @samp{vec_saddh_narrow_lo_@var{m}} > +@item @samp{vec_uaddh_narrow_even_@var{m}}, > @samp{vec_uaddh_narrow_even_@var{m}} > +@itemx @samp{vec_saddh_narrow_odd_@var{m}}, > @samp{vec_saddh_narrow_odd_@var{m}} > +Signed/Unsigned widening add long extract high half and narrow. Operands 1 > and > +2 are vectors with N signed/unsigned elements of size S@. Add the high/low > +elements of 1 and 2 together in a widened precision, extract the top half and > +narrow the result to half the size of S@ abd store the results in the output > +vector (operand 0). Congretely it does > +@code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)} > + > @cindex @code{vec_addsub@var{m}3} instruction pattern > @item @samp{vec_addsub@var{m}3} > Alternating subtract, add with even lanes doing subtract and odd > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index > 83438dd2ff57474cec999adaeabe92c0540e2a51..e600dbc4b3a0b27f78be00d52f7f6a54a13d7241 > 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4442,6 +4442,9 @@ commutative_binary_fn_p (internal_fn fn) > case IFN_VEC_WIDEN_PLUS_HI: > case IFN_VEC_WIDEN_PLUS_EVEN: > case IFN_VEC_WIDEN_PLUS_ODD: > + case IFN_VEC_ADD_HALFING_NARROW: > + case IFN_VEC_ADD_HALFING_NARROW_LO: > + case IFN_VEC_ADD_HALFING_NARROW_EVEN: > return true; > > default: > @@ -4462,6 +4465,8 @@ commutative_ternary_fn_p (internal_fn fn) > case IFN_FNMA: > case IFN_FNMS: > case IFN_UADDC: > + case IFN_VEC_ADD_HALFING_NARROW_HI: > + case IFN_VEC_ADD_HALFING_NARROW_ODD: Huh, how can this be correct? Are they not binary? > return true; > > default: > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index > 69677dd10b980c83dec36487b1214ff066f4789b..152895f043b3ca60294b79c8301c6ff4014b955d > 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -463,6 +463,12 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD, > first, > vec_widen_sabd, vec_widen_uabd, > binary) > +DEF_INTERNAL_NARROWING_OPTAB_FN (VEC_ADD_HALFING_NARROW, > + ECF_CONST | ECF_NOTHROW, > + first, > + vec_saddh_narrow, vec_uaddh_narrow, > + binary, ternary) OK, I guess should have started to look at 1/n. Doing that now in parallel. > + > DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) > DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) > > diff --git a/gcc/match.pd b/gcc/match.pd > index > 66e8a78744931c0137b83c5633c3a273fb69f003..d9d9046a8dcb7e5ca7cdf7c83e1945289950dc51 > 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3181,6 +3181,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > || POINTER_TYPE_P (itype)) > && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype)))))) > > +/* Detect (n)(((w)x + (w)y) >> bitsize(y)) where w is twice the bitsize of x > and > + y and n is half the bitsize of x and y. */ > +(match (add_half_narrowing_p @0 @1) > + (convert1? (rshift (plus:c (convert@3 @0) (convert @1)) INTEGER_CST@2)) why's the outer convert optional? The checks on n and w would make a conversion required I think. Just use (convert (rshift (... here. > + (with { unsigned n = TYPE_PRECISION (type); > + unsigned w = TYPE_PRECISION (TREE_TYPE (@3)); > + unsigned x = TYPE_PRECISION (TREE_TYPE (@0)); } > + (if (INTEGRAL_TYPE_P (type) > + && n == x / 2 Now, because of weird types it would be safer to check n * 2 == x, just in case of odd x ... Alternatively/additionally check && type_has_mode_precision_p (type) > + && w == x * 2 > + && wi::eq_p (wi::to_wide (@2), x / 2))))) > + > /* Saturation add for unsigned integer. */ > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)) > (match (usadd_overflow_mask @0 @1) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index > 87a8b85da1592646d0a3447572e842ceb158cd97..e226d85ddba7e43dd801faec61cac0372286314a > 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -492,6 +492,16 @@ OPTAB_D (vec_widen_uabd_hi_optab, "vec_widen_uabd_hi_$a") > OPTAB_D (vec_widen_uabd_lo_optab, "vec_widen_uabd_lo_$a") > OPTAB_D (vec_widen_uabd_odd_optab, "vec_widen_uabd_odd_$a") > OPTAB_D (vec_widen_uabd_even_optab, "vec_widen_uabd_even_$a") > +OPTAB_D (vec_saddh_narrow_optab, "vec_saddh_narrow$a") > +OPTAB_D (vec_saddh_narrow_hi_optab, "vec_saddh_narrow_hi_$a") > +OPTAB_D (vec_saddh_narrow_lo_optab, "vec_saddh_narrow_lo_$a") > +OPTAB_D (vec_saddh_narrow_odd_optab, "vec_saddh_narrow_odd_$a") > +OPTAB_D (vec_saddh_narrow_even_optab, "vec_saddh_narrow_even_$a") > +OPTAB_D (vec_uaddh_narrow_optab, "vec_uaddh_narrow$a") > +OPTAB_D (vec_uaddh_narrow_hi_optab, "vec_uaddh_narrow_hi_$a") > +OPTAB_D (vec_uaddh_narrow_lo_optab, "vec_uaddh_narrow_lo_$a") > +OPTAB_D (vec_uaddh_narrow_odd_optab, "vec_uaddh_narrow_odd_$a") > +OPTAB_D (vec_uaddh_narrow_even_optab, "vec_uaddh_narrow_even_$a") > OPTAB_D (vec_addsub_optab, "vec_addsub$a3") > OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") > OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index > ffb320fbf2330522f25a9f4380f4744079a42306..b590c36fad23e44ec3fb954a4d2bb856ce3fc139 > 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -4768,6 +4768,64 @@ vect_recog_sat_trunc_pattern (vec_info *vinfo, > stmt_vec_info stmt_vinfo, > return NULL; > } > > +extern bool gimple_add_half_narrowing_p (tree, tree*, tree (*)(tree)); > + > +/* > + * Try to detect add halfing and narrowing pattern. > + * > + * _1 = (W)a > + * _2 = (W)b > + * _3 = _1 + _2 > + * _4 = _3 >> (precision(a) / 2) > + * _5 = (N)_4 > + * > + * where > + * W = precision (a) * 2 > + * N = precision (a) / 2 > + */ > + > +static gimple * > +vect_recog_add_halfing_narrow_pattern (vec_info *vinfo, > + stmt_vec_info stmt_vinfo, > + tree *type_out) > +{ > + gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo); > + > + if (!is_gimple_assign (last_stmt)) > + return NULL; > + > + tree ops[2]; > + tree lhs = gimple_assign_lhs (last_stmt); > + > + if (gimple_add_half_narrowing_p (lhs, ops, NULL)) > + { > + tree itype = TREE_TYPE (ops[0]); > + tree otype = TREE_TYPE (lhs); > + tree v_itype = get_vectype_for_scalar_type (vinfo, itype); > + tree v_otype = get_vectype_for_scalar_type (vinfo, otype); > + internal_fn ifn = IFN_VEC_ADD_HALFING_NARROW; > + > + if (v_itype != NULL_TREE && v_otype != NULL_TREE > + && direct_internal_fn_supported_p (ifn, v_itype, OPTIMIZE_FOR_BOTH)) why have the HI/LO and EVEN/ODD variants when you check for IFN_VEC_ADD_HALFING_NARROW only? > + { > + gcall *call = gimple_build_call_internal (ifn, 2, ops[0], ops[1]); > + tree in_ssa = vect_recog_temp_ssa_var (otype, NULL); > + > + gimple_call_set_lhs (call, in_ssa); > + gimple_call_set_nothrow (call, /* nothrow_p */ false); > + gimple_set_location (call, > + gimple_location (STMT_VINFO_STMT > (stmt_vinfo))); > + > + *type_out = v_otype; > + vect_pattern_detected ("vect_recog_add_halfing_narrow_pattern", > + last_stmt); > + return call; > + } > + } > + > + return NULL; > +} > + > /* Detect a signed division by a constant that wouldn't be > otherwise vectorized: > > @@ -6896,6 +6954,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { > { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, > { vect_recog_bit_insert_pattern, "bit_insert" }, > { vect_recog_abd_pattern, "abd" }, > + { vect_recog_add_halfing_narrow_pattern, "addhn" }, > { vect_recog_over_widening_pattern, "over_widening" }, > /* Must come after over_widening, which narrows the shift as much as > possible beforehand. */ > > > --