> -----Original Message----- > From: Richard Biener <richard.guent...@gmail.com> > Sent: Wednesday, August 20, 2025 1:48 PM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; rguent...@suse.de > Subject: Re: [PATCH 2/5]middle-end: Add detection for add halfing and > narrowing > instruction > > On Tue, Aug 19, 2025 at 6:29 AM Tamar Christina <tamar.christ...@arm.com> > wrote: > > > > This adds support for detectioon of the ADDHN pattern in the vectorizer. > > > > Concretely try to detect > > > > _1 = (W)a > > _2 = (W)b > > _3 = _1 + _2 > > _4 = _3 >> (precision(a) / 2) > > _5 = (N)_4 > > > > where > > W = precision (a) * 2 > > N = precision (a) / 2 > > Hmm. Is the widening because of UB with signed overflow? The > actual carry of a + b doesn't end up in (N)(_3 >> (precision(a) / 2)). > I'd expect that for unsigned a and b you could see just > (N)((a + b) >> (precision(a) / 2)), no? Integer promotion would make > this difficult to write, of course, unless the patterns exist for SImode > -> HImode add-high. >
I guess the description is inaccurate, addhn extract explicitly the high bits of the results. So the high bits will end up in the low part. > Also ... > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > -m32, -m64 and no issues. > > > > Ok for master? Tests in the next patch which adds the optabs to AArch64. > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * internal-fn.def (VEC_ADD_HALFING_NARROW, > > IFN_VEC_ADD_HALFING_NARROW_LO, > IFN_VEC_ADD_HALFING_NARROW_HI, > > IFN_VEC_ADD_HALFING_NARROW_EVEN, > IFN_VEC_ADD_HALFING_NARROW_ODD): New. > > * internal-fn.cc (commutative_binary_fn_p): Add > > IFN_VEC_ADD_HALFING_NARROW, IFN_VEC_ADD_HALFING_NARROW_LO > and > > IFN_VEC_ADD_HALFING_NARROW_EVEN. > > (commutative_ternary_fn_p): Add IFN_VEC_ADD_HALFING_NARROW_HI, > > IFN_VEC_ADD_HALFING_NARROW_ODD. > > * match.pd (add_half_narrowing_p): New. > > * optabs.def (vec_saddh_narrow_optab, vec_saddh_narrow_hi_optab, > > vec_saddh_narrow_lo_optab, vec_saddh_narrow_odd_optab, > > vec_saddh_narrow_even_optab, vec_uaddh_narrow_optab, > > vec_uaddh_narrow_hi_optab, vec_uaddh_narrow_lo_optab, > > vec_uaddh_narrow_odd_optab, vec_uaddh_narrow_even_optab): New. > > * tree-vect-patterns.cc (gimple_add_half_narrowing_p): New. > > (vect_recog_add_halfing_narrow_pattern): New. > > (vect_vect_recog_func_ptrs): Use it. > > * doc/generic.texi: Document them. > > * doc/md.texi: Likewise. > > > > --- > > diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi > > index > d4ac580a7a8b9cd339d26cb97f7eb963f83746a4..b32d99d4d1aad244a493d8f > 67b66151ff5363d0e 100644 > > --- a/gcc/doc/generic.texi > > +++ b/gcc/doc/generic.texi > > @@ -1834,6 +1834,11 @@ a value from @code{enum annot_expr_kind}, the > third is an @code{INTEGER_CST}. > > @tindex IFN_VEC_WIDEN_MINUS_LO > > @tindex IFN_VEC_WIDEN_MINUS_EVEN > > @tindex IFN_VEC_WIDEN_MINUS_ODD > > +@tindex IFN_VEC_ADD_HALFING_NARROW > > +@tindex IFN_VEC_ADD_HALFING_NARROW_HI > > +@tindex IFN_VEC_ADD_HALFING_NARROW_LO > > +@tindex IFN_VEC_ADD_HALFING_NARROW_EVEN > > +@tindex IFN_VEC_ADD_HALFING_NARROW_ODD > > @tindex VEC_UNPACK_HI_EXPR > > @tindex VEC_UNPACK_LO_EXPR > > @tindex VEC_UNPACK_FLOAT_HI_EXPR > > @@ -1956,6 +1961,51 @@ vector of @code{N/2} subtractions. In the case of > > vector are subtracted from the odd @code{N/2} of the first to produce the > > vector of @code{N/2} subtractions. > > > > +@item IFN_VEC_ADD_HALFING_NARROW > > +This internal function represents widening vector addition of two input > > +vectors, extracting the top half of the result and narrow that value to a > > type > > +half that of the original input. > > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}. Its > > operands > > +are vectors that contain the same number of elements (@code{N}) of the same > > +integral type. The result is a vector that contains the same amount > > (@code{N}) > > +of elements, of an integral type whose size is twice as narrow, as the > > input > > +vectors. If the current target does not implement the corresponding > > optabs the > > +vectorizer may choose to split it into either a pair > > +of @code{IFN_VEC_ADD_HALFING_NARROW_HI} and > @code{IFN_VEC_ADD_HALFING_NARROW_LO} > > +or @code{IFN_VEC_ADD_HALFING_NARROW_EVEN} and > > +@code{IFN_VEC_ADD_HALFING_NARROW_ODD}, depending on what optabs > the target > > +implements. > > + > > +@item IFN_VEC_ADD_HALFING_NARROW_HI > > +@itemx IFN_VEC_ADD_HALFING_NARROW_LO > > +This internal function represents widening vector addition of two input > > +vectors, extracting the top half of the result and narrow that value to a > > type > > +half that of the original input inserting the result as the high or low > > half of > > +the result vector. > > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}. Their > > +operands are vectors that contain the same number of elements (@code{N}) of > the > > +same integral type. The result is a vector that contains half as many > > elements, > > +of an integral type whose size is twice as narrow. In the case of > > +@code{IFN_VEC_ADD_HALFING_NARROW_HI} the high @code{N/2} elements > of the result > > +is inserted into the given result vector with the low elements left > > untouched. > > +The operation is a RMW. In the case of > @code{IFN_VEC_ADD_HALFING_NARROW_LO} the > > +low @code{N/2} elements of the result is used as the full result. > > + > > +@item IFN_VEC_ADD_HALFING_NARROW_EVEN > > +@itemx IFN_VEC_ADD_HALFING_NARROW_ODD > > +This internal function represents widening vector addition of two input > > +vectors, extracting the top half of the result and narrow that value to a > > type > > +half that of the original input inserting the result as the even or odd > > parts of > > +the result vector. > > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}. Their > > +operands are vectors that contain the same number of elements (@code{N}) of > the > > +same integral type. The result is a vector that contains half as many > > elements, > > +of an integral type whose size is twice as narrow. In the case of > > +@code{IFN_VEC_ADD_HALFING_NARROW_ODD} the odd @code{N/2} > elements of the result > > +is inserted into the given result vector with the even elements left > > untouched. > > +The operation is a RMW. In the case of > @code{IFN_VEC_ADD_HALFING_NARROW_EVEN} > > +the even @code{N/2} elements of the result is used as the full result. > > + > > @item VEC_UNPACK_HI_EXPR > > @itemx VEC_UNPACK_LO_EXPR > > These nodes represent unpacking of the high and low parts of the input > > vector, > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > index > aba93f606eca59d31c103a05b2567fd4f3be55f3..cb691b56f137a0037f5178ba8 > 53911df5a65e5a7 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -6087,6 +6087,21 @@ vectors with N signed/unsigned elements of size > S@. Find the absolute > > difference between operands 1 and 2 and widen the resulting elements. > > Put the N/2 results of size 2*S in the output vector (operand 0). > > > > +@cindex @code{vec_saddh_narrow_hi_@var{m}} instruction pattern > > +@cindex @code{vec_saddh_narrow_lo_@var{m}} instruction pattern > > +@cindex @code{vec_uaddh_narrow_hi_@var{m}} instruction pattern > > +@cindex @code{vec_uaddh_narrow_lo_@var{m}} instruction pattern > > +@item @samp{vec_uaddh_narrow_hi_@var{m}}, > @samp{vec_uaddh_narrow_lo_@var{m}} > > +@itemx @samp{vec_saddh_narrow_hi_@var{m}}, > @samp{vec_saddh_narrow_lo_@var{m}} > > +@item @samp{vec_uaddh_narrow_even_@var{m}}, > @samp{vec_uaddh_narrow_even_@var{m}} > > +@itemx @samp{vec_saddh_narrow_odd_@var{m}}, > @samp{vec_saddh_narrow_odd_@var{m}} > > +Signed/Unsigned widening add long extract high half and narrow. Operands 1 > and > > +2 are vectors with N signed/unsigned elements of size S@. Add the high/low > > +elements of 1 and 2 together in a widened precision, extract the top half > > and > > +narrow the result to half the size of S@ abd store the results in the > > output > > +vector (operand 0). Congretely it does > > +@code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)} > > + > > @cindex @code{vec_addsub@var{m}3} instruction pattern > > @item @samp{vec_addsub@var{m}3} > > Alternating subtract, add with even lanes doing subtract and odd > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > > index > 83438dd2ff57474cec999adaeabe92c0540e2a51..e600dbc4b3a0b27f78be00d5 > 2f7f6a54a13d7241 100644 > > --- a/gcc/internal-fn.cc > > +++ b/gcc/internal-fn.cc > > @@ -4442,6 +4442,9 @@ commutative_binary_fn_p (internal_fn fn) > > case IFN_VEC_WIDEN_PLUS_HI: > > case IFN_VEC_WIDEN_PLUS_EVEN: > > case IFN_VEC_WIDEN_PLUS_ODD: > > + case IFN_VEC_ADD_HALFING_NARROW: > > + case IFN_VEC_ADD_HALFING_NARROW_LO: > > + case IFN_VEC_ADD_HALFING_NARROW_EVEN: > > return true; > > > > default: > > @@ -4462,6 +4465,8 @@ commutative_ternary_fn_p (internal_fn fn) > > case IFN_FNMA: > > case IFN_FNMS: > > case IFN_UADDC: > > + case IFN_VEC_ADD_HALFING_NARROW_HI: > > + case IFN_VEC_ADD_HALFING_NARROW_ODD: > > Huh, how can this be correct? Are they not binary? Correct they're ternary. > > > return true; > > > > default: > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > index > 69677dd10b980c83dec36487b1214ff066f4789b..152895f043b3ca60294b79c > 8301c6ff4014b955d 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -463,6 +463,12 @@ DEF_INTERNAL_WIDENING_OPTAB_FN > (VEC_WIDEN_ABD, > > first, > > vec_widen_sabd, vec_widen_uabd, > > binary) > > +DEF_INTERNAL_NARROWING_OPTAB_FN (VEC_ADD_HALFING_NARROW, > > + ECF_CONST | ECF_NOTHROW, > > + first, > > + vec_saddh_narrow, vec_uaddh_narrow, > > + binary, ternary) > > OK, I guess should have started to look at 1/n. Doing that now in parallel. > > > + > > DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, > ternary) > > DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, > ternary) > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index > 66e8a78744931c0137b83c5633c3a273fb69f003..d9d9046a8dcb7e5ca7cdf7c8 > 3e1945289950dc51 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -3181,6 +3181,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > || POINTER_TYPE_P (itype)) > > && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype)))))) > > > > +/* Detect (n)(((w)x + (w)y) >> bitsize(y)) where w is twice the bitsize of > > x and > > + y and n is half the bitsize of x and y. */ > > +(match (add_half_narrowing_p @0 @1) > > + (convert1? (rshift (plus:c (convert@3 @0) (convert @1)) INTEGER_CST@2)) > > why's the outer convert optional? The checks on n and w would make > a conversion required I think. Just use (convert (rshift (... here. Because match.pd wouldn't let me do it without the optional conversion. The test on the bitsize essentially mandates it's there anyway. > > > + (with { unsigned n = TYPE_PRECISION (type); > > + unsigned w = TYPE_PRECISION (TREE_TYPE (@3)); > > + unsigned x = TYPE_PRECISION (TREE_TYPE (@0)); } > > + (if (INTEGRAL_TYPE_P (type) > > + && n == x / 2 > > Now, because of weird types it would be safer to check n * 2 == x, > just in case of odd x ... > > Alternatively/additionally check && type_has_mode_precision_p (type) > > > + && w == x * 2 > > + && wi::eq_p (wi::to_wide (@2), x / 2))))) > > + > > /* Saturation add for unsigned integer. */ > > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)) > > (match (usadd_overflow_mask @0 @1) > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > index > 87a8b85da1592646d0a3447572e842ceb158cd97..e226d85ddba7e43dd801fae > c61cac0372286314a 100644 > > --- a/gcc/optabs.def > > +++ b/gcc/optabs.def > > @@ -492,6 +492,16 @@ OPTAB_D (vec_widen_uabd_hi_optab, > "vec_widen_uabd_hi_$a") > > OPTAB_D (vec_widen_uabd_lo_optab, "vec_widen_uabd_lo_$a") > > OPTAB_D (vec_widen_uabd_odd_optab, "vec_widen_uabd_odd_$a") > > OPTAB_D (vec_widen_uabd_even_optab, "vec_widen_uabd_even_$a") > > +OPTAB_D (vec_saddh_narrow_optab, "vec_saddh_narrow$a") > > +OPTAB_D (vec_saddh_narrow_hi_optab, "vec_saddh_narrow_hi_$a") > > +OPTAB_D (vec_saddh_narrow_lo_optab, "vec_saddh_narrow_lo_$a") > > +OPTAB_D (vec_saddh_narrow_odd_optab, "vec_saddh_narrow_odd_$a") > > +OPTAB_D (vec_saddh_narrow_even_optab, "vec_saddh_narrow_even_$a") > > +OPTAB_D (vec_uaddh_narrow_optab, "vec_uaddh_narrow$a") > > +OPTAB_D (vec_uaddh_narrow_hi_optab, "vec_uaddh_narrow_hi_$a") > > +OPTAB_D (vec_uaddh_narrow_lo_optab, "vec_uaddh_narrow_lo_$a") > > +OPTAB_D (vec_uaddh_narrow_odd_optab, "vec_uaddh_narrow_odd_$a") > > +OPTAB_D (vec_uaddh_narrow_even_optab, "vec_uaddh_narrow_even_$a") > > OPTAB_D (vec_addsub_optab, "vec_addsub$a3") > > OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") > > OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index > ffb320fbf2330522f25a9f4380f4744079a42306..b590c36fad23e44ec3fb954a4d > 2bb856ce3fc139 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -4768,6 +4768,64 @@ vect_recog_sat_trunc_pattern (vec_info *vinfo, > stmt_vec_info stmt_vinfo, > > return NULL; > > } > > > > +extern bool gimple_add_half_narrowing_p (tree, tree*, tree (*)(tree)); > > + > > +/* > > + * Try to detect add halfing and narrowing pattern. > > + * > > + * _1 = (W)a > > + * _2 = (W)b > > + * _3 = _1 + _2 > > + * _4 = _3 >> (precision(a) / 2) > > + * _5 = (N)_4 > > + * > > + * where > > + * W = precision (a) * 2 > > + * N = precision (a) / 2 > > + */ > > + > > +static gimple * > > +vect_recog_add_halfing_narrow_pattern (vec_info *vinfo, > > + stmt_vec_info stmt_vinfo, > > + tree *type_out) > > +{ > > + gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo); > > + > > + if (!is_gimple_assign (last_stmt)) > > + return NULL; > > + > > + tree ops[2]; > > + tree lhs = gimple_assign_lhs (last_stmt); > > + > > + if (gimple_add_half_narrowing_p (lhs, ops, NULL)) > > + { > > + tree itype = TREE_TYPE (ops[0]); > > + tree otype = TREE_TYPE (lhs); > > + tree v_itype = get_vectype_for_scalar_type (vinfo, itype); > > + tree v_otype = get_vectype_for_scalar_type (vinfo, otype); > > + internal_fn ifn = IFN_VEC_ADD_HALFING_NARROW; > > + > > + if (v_itype != NULL_TREE && v_otype != NULL_TREE > > + && direct_internal_fn_supported_p (ifn, v_itype, > > OPTIMIZE_FOR_BOTH)) > > why have the HI/LO and EVEN/ODD variants when you check for > IFN_VEC_ADD_HALFING_NARROW > only? > Because without HI/LO we will have to have quite a few arguments into the actual Instruction. VEC_ADD_HALFING_NARROW does arithmetic as well, so the inputs are spread out over the operands. VEC_ADD_HALFING_NARROW would require 4 inputs, where the first two and last two is used together. This would be completely unclear from the use of the instruction itself. I could, but then it also means if you have a narrowing instruction which needs 3 inputs that the IFN needs 6. It did not seem logical to do so. The alternative would have been to use just two inputs and use VEC_PERM_EXPR to combine them. This would work for HI/LO, but then require backends to then recognize the permute back into hi/lo operations, taking into account endianness. Possible but seemed a roundabout way of doing it. Secondly it doesn't work for even/odd. VEC_PERM would fill in only a strided value of the vector at a time. This becomes difficult for VLA and then you have to do tricks like discount the costing of the permute if it's following an instruction you have even/odd variant of. Concretely using VEC_ADD_HALFING_NARROW creates more issues than it solves, but if you want that variant I will respin. Tamar > > + { > > + gcall *call = gimple_build_call_internal (ifn, 2, ops[0], ops[1]); > > + tree in_ssa = vect_recog_temp_ssa_var (otype, NULL); > > + > > + gimple_call_set_lhs (call, in_ssa); > > + gimple_call_set_nothrow (call, /* nothrow_p */ false); > > + gimple_set_location (call, > > + gimple_location (STMT_VINFO_STMT > > (stmt_vinfo))); > > + > > + *type_out = v_otype; > > + vect_pattern_detected ("vect_recog_add_halfing_narrow_pattern", > > + last_stmt); > > + return call; > > + } > > + } > > + > > + return NULL; > > +} > > + > > /* Detect a signed division by a constant that wouldn't be > > otherwise vectorized: > > > > @@ -6896,6 +6954,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = > { > > { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, > > { vect_recog_bit_insert_pattern, "bit_insert" }, > > { vect_recog_abd_pattern, "abd" }, > > + { vect_recog_add_halfing_narrow_pattern, "addhn" }, > > { vect_recog_over_widening_pattern, "over_widening" }, > > /* Must come after over_widening, which narrows the shift as much as > > possible beforehand. */ > > > > > > --