On Wed, 20 Aug 2025, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener <rguent...@suse.de> > > Sent: Wednesday, August 20, 2025 1:48 PM > > To: Tamar Christina <tamar.christ...@arm.com> > > Cc: gcc-patches@gcc.gnu.org; rdsandif...@googlemail.com; nd <n...@arm.com> > > Subject: Re: [PATCH 1/5]middle-end: Add scaffolding to support narrowing > > IFNs > > > > On Tue, 19 Aug 2025, Tamar Christina wrote: > > > > > This adds scaffolding for supporting narrowing IFNs inside the vectorizer > > > in a > > > similar way as how widening is supported. However because narrowing > > operations > > > always have the same number of elements as the input and output we need to > > be > > > able to combine the results. One way this could have been done is by > > > using a > > > vec_perm_expr but this then can become tricky to recognize as low/hi > > > pairs in > > > backends. > > > > > > As such I've chosen the design where the _hi and _odd variants of the > > > instructions must always be RMW. This simplifies the implementation and > > targets > > > that don't want this can use the direct conversion variant. > > > > the canonial way for "narrowing" would be to have a > > > > vec_pack_saddh_optab > > > > that takes two input vectors for each operand (we currently have such > > for conversions, aka the single operand case). There's no hi/lo > > involved, that's only for widening as we can't have two outputs. > > > > So - no, we don't want this new odd way of doing. Either only go with > > vec_saddh_narrow, aka the result mode is of half size, if that suits > > you, or please add the first "pack" variant of a binary operation. > > > > "pack" would imply narrow here. Alternatively vec_pack_narrow_saddh > > and vec_narrow_saddh as the two variants. > > > > Note that for composition I'd use a CTOR. Note that in your scheme > > the even/odd variant would interleave one result into the other? > > Would the binary optab then fill only every 2nd output lane? The > > documentation in 2/n isn't exactly clear here. > > We've discussed a lot of this on IRC and I believe a lot of this is a > misunderstanding > and related to how the current optabs are documented. I have, as our current > documentation put the detailed documentation with the IFN rather than the > optab. > > This was following the convention seemingly established by the widening > variants. > > But to come back to what we ended up with on IRC. > > I proposed > > FOO -> {V4SI, V4SI} -> {V4HI} and FOO_LO -> {V4SI, V4SI} -> V4HI, > FOO_MERGE_HI -> {V4SI, V4SI, V4HI} -> V8HI > > And you proposed back > > FOO_LO -> {V4SI, V4SI} -> V8HI as well > FOO_EVEN -> {V4SI, V4SI} -> V8HI > > Because the x86 variant of these instructions return registers of the same > size as the inputs, however the > documentation of these instructions [1] state that for the AVX variants the > upper half Is zero'd, and for SSE the > upper half is undefined. > > This to me seems like it means that x86 does not have _HI/_LO variant of > these instructions and that we're making > a change to accommodate an ISA that can't support these instructions. I > believe x86 only has FOO. > > And for FOO I don't think we should return V8HI because on SSE the top bits > are undefined. You proposed I introduce > an explicit zeroing first, but SSE and AVX are not consistent here. I > believe this should return V4HI because it is the only > bits that *all* ISAs specify what the bits should be, and won't have an issue > with endianness. > > We discussed EVEN and ODD as well. I think again there is a fundamental > issue with EVEN/ODD, one that wasn't encountered > because, well. Nothing uses the code. EVEN/ODD unlike HI/LO cannot be > detected as a pattern. Because the lanes to permute > around just may not be there *unless* you unroll. And patterns can't force an > unroll since unroll factors are determined after > all pattern matching. > > That means they can only be detected later on. This means EVEN/ODD detection > cannot, fundamentally rely, or relate to the > generic FOO. This patch does not ascribe any definition or implementation to > EVEN/ODD aside that ODD must be RMW. > > I don't think this can be described any differently, because widening *reads* > and narrowing *writes*. So FOO_EVEN -> {V4SI, V4SI} -> V8HI > Is already true in this patch, because again, it's not used, so the modes can > be anything. I only added it because for some reason EVEN/ODD > was required before even though there isn't an implementation for it. I > didn't add it, I'm just following that convention. > > So Again I propose > > FOO -> {V4SI, V4SI} -> {V4HI} and FOO_LO -> {V4SI, V4SI} -> V4HI, > FOO_MERGE_HI -> {V4SI, V4SI, V4HI} -> V8HI > > Because FOO_LO -> {V4SI, V4SI} -> V8HI is not a native operation for AArch64, > x86 maps naturally to FOO and it doesn't have endianness > issues, so I don't see a good reason to complicate the implementation for the > target that can actually support it.
You said that on aarch64 foo_lo zeroes the high part of V8HI. We also raised the issue of endianess, where for big-endian the meaning of _hi and _lo swap. So for big-endian the _lo would be the merging operation. So I was proposing to have FOO_MERGE_LO and FOO_MERGE_HI, necessarily both {V4SI, V4SI} -> V8HI then, and define_insns that would only allow all-zero to-merge-into for aarch64 merge_lo. Note I did not check at all whether x86 actually has instructions doing addh - you appearantly did, which one is it? Your FOO -> {V4SI, V4SI} -> {V4HI} is what you can use for convenience in place of FOO_LO with V4HI input. That's also conveniently endianess invariant. Can you provide links to the documentation on the aarch64 ISA for addh? Richard. > Thanks, > Tamar > > [1] https://www.felixcloutier.com/x86/addpd > > > > Thanks, > > Richard. > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > -m32, -m64 and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > * internal-fn.cc (lookup_hilo_internal_fn, > > > DEF_INTERNAL_NARROWING_OPTAB_FN, lookup_evenodd_internal_fn, > > > narrowing_fn_p, narrowing_evenodd_fn_p): New. > > > * internal-fn.def (DEF_INTERNAL_NARROWING_OPTAB_FN): New. > > > * internal-fn.h (narrowing_fn_p, narrowing_evenodd_fn_p): New. > > > * tree-vect-stmts.cc (simple_integer_narrowing, vectorizable_call, > > > vectorizable_conversion, supportable_widening_operation, > > > supportable_narrowing_operation): Use it. > > > * tree-vectorizer.h (supportable_narrowing_operation): Modify > > > signature. > > > > > > --- > > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > > > index > > bf2fac8180706ec418de7eb97cd1260f1d078c03..83438dd2ff57474cec999adae > > abe92c0540e2a51 100644 > > > --- a/gcc/internal-fn.cc > > > +++ b/gcc/internal-fn.cc > > > @@ -101,7 +101,7 @@ lookup_internal_fn (const char *name) > > > extern void > > > lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn > > > *hi) > > > { > > > - gcc_assert (widening_fn_p (ifn)); > > > + gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn)); > > > > > > switch (ifn) > > > { > > > @@ -113,6 +113,11 @@ lookup_hilo_internal_fn (internal_fn ifn, internal_fn > > *lo, internal_fn *hi) > > > *lo = internal_fn (IFN_##NAME##_LO); \ > > > *hi = internal_fn (IFN_##NAME##_HI); \ > > > break; > > > +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, S, SO, UO, T1, T2) \ > > > + case IFN_##NAME: \ > > > + *lo = internal_fn (IFN_##NAME##_LO); \ > > > + *hi = internal_fn (IFN_##NAME##_HI); \ > > > + break; > > > #include "internal-fn.def" > > > } > > > } > > > @@ -124,7 +129,7 @@ extern void > > > lookup_evenodd_internal_fn (internal_fn ifn, internal_fn *even, > > > internal_fn *odd) > > > { > > > - gcc_assert (widening_fn_p (ifn)); > > > + gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn)); > > > > > > switch (ifn) > > > { > > > @@ -136,6 +141,11 @@ lookup_evenodd_internal_fn (internal_fn ifn, > > internal_fn *even, > > > *even = internal_fn (IFN_##NAME##_EVEN); \ > > > *odd = internal_fn (IFN_##NAME##_ODD); \ > > > break; > > > +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, S, SO, UO, T1, T2) \ > > > + case IFN_##NAME: \ > > > + *even = internal_fn (IFN_##NAME##_EVEN); \ > > > + *odd = internal_fn (IFN_##NAME##_ODD); \ > > > + break; > > > #include "internal-fn.def" > > > } > > > } > > > @@ -4548,6 +4558,35 @@ widening_fn_p (code_helper code) > > > } > > > } > > > > > > +/* Return true if this CODE describes an internal_fn that returns a > > > vector with > > > + elements twice as narrow as the element size of the input vectors. */ > > > + > > > +bool > > > +narrowing_fn_p (code_helper code) > > > +{ > > > + if (!code.is_fn_code ()) > > > + return false; > > > + > > > + if (!internal_fn_p ((combined_fn) code)) > > > + return false; > > > + > > > + internal_fn fn = as_internal_fn ((combined_fn) code); > > > + switch (fn) > > > + { > > > + #define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, S, SO, UO, T1, > > T2) \ > > > + case IFN_##NAME: > > > \ > > > + case IFN_##NAME##_HI: > > > \ > > > + case IFN_##NAME##_LO: > > > \ > > > + case IFN_##NAME##_EVEN: > > > \ > > > + case IFN_##NAME##_ODD: > > > \ > > > + return true; > > > + #include "internal-fn.def" > > > + > > > + default: > > > + return false; > > > + } > > > +} > > > + > > > /* Return true if this CODE describes an internal_fn that returns a > > > vector with > > > elements twice as wide as the element size of the input vectors and > > > operates > > > on even/odd parts of the input. */ > > > @@ -4575,6 +4614,33 @@ widening_evenodd_fn_p (code_helper code) > > > } > > > } > > > > > > +/* Return true if this CODE describes an internal_fn that returns a > > > vector with > > > + elements twice as narrow as the element size of the input vectors and > > > + operates on even/odd parts of the input. */ > > > + > > > +bool > > > +narrowing_evenodd_fn_p (code_helper code) > > > +{ > > > + if (!code.is_fn_code ()) > > > + return false; > > > + > > > + if (!internal_fn_p ((combined_fn) code)) > > > + return false; > > > + > > > + internal_fn fn = as_internal_fn ((combined_fn) code); > > > + switch (fn) > > > + { > > > + #define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, S, SO, UO, T1, > > T2) \ > > > + case IFN_##NAME##_EVEN: > > > \ > > > + case IFN_##NAME##_ODD: > > > \ > > > + return true; > > > + #include "internal-fn.def" > > > + > > > + default: > > > + return false; > > > + } > > > +} > > > + > > > /* Return true if IFN_SET_EDOM is supported. */ > > > > > > bool > > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > > index > > d2480a1bf7927476215bc7bb99c0b74197d2b7e9..69677dd10b980c83dec364 > > 87b1214ff066f4789b 100644 > > > --- a/gcc/internal-fn.def > > > +++ b/gcc/internal-fn.def > > > @@ -40,6 +40,8 @@ along with GCC; see the file COPYING3. If not see > > > DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE) > > > DEF_INTERNAL_WIDENING_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, > > UOPTAB, > > > TYPE) > > > + DEF_INTERNAL_NARROWING_OPTAB_FN (NAME, FLAGS, SELECTOR, > > SOPTAB, UOPTAB, > > > + TYPE_LO, TYPE_HI) > > > > > > where NAME is the name of the function, FLAGS is a set of > > > ECF_* flags and FNSPEC is a string describing functions fnspec. > > > @@ -122,6 +124,21 @@ along with GCC; see the file COPYING3. If not see > > > These five internal functions will require two optabs each, a > > > SIGNED_OPTAB > > > and an UNSIGNED_OTPAB. > > > > > > + DEF_INTERNAL_NARROWING_OPTAB_FN is a wrapper that defines five > > internal > > > + functions with DEF_INTERNAL_SIGNED_OPTAB_FN: > > > + - one that describes a narrowing operation with the same number of > > > elements > > > + in the output and input vectors, > > > + - two that describe a pair of high-low narrowing operations where the > > > output > > > + vectors each have half the number of elements of the input vectors, > > > + corresponding to the result of the narrowing operation on the top > > > half and > > > + bottom half, these have the suffixes _HI and _LO, > > > + - and two that describe a pair of even-odd narrowing operations where > > > the > > > + output vectors each have half the number of elements of the input > > > vectors, > > > + corresponding to the result of the narrowing operation on the even > > > and odd > > > + elements, these have the suffixes _EVEN and _ODD. > > > + These five internal functions will require two optabs each, a > > > SIGNED_OPTAB > > > + and an UNSIGNED_OTPAB. > > > + > > > DEF_INTERNAL_COND_FN is a wrapper that defines 2 internal functions > > > with > > > DEF_INTERNAL_OPTAB_FN: > > > - One is COND_* operations that are predicated by mask only. Such > > > operations > > > @@ -184,6 +201,15 @@ along with GCC; see the file COPYING3. If not see > > > DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _ODD, FLAGS, SELECTOR, > > SOPTAB##_odd, UOPTAB##_odd, TYPE) > > > #endif > > > > > > +#ifndef DEF_INTERNAL_NARROWING_OPTAB_FN > > > +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, FLAGS, SELECTOR, > > SOPTAB, UOPTAB, TYPE_LO, TYPE_HI) \ > > > + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, > > UOPTAB, TYPE_LO) \ > > > + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _LO, FLAGS, SELECTOR, > > SOPTAB##_lo, UOPTAB##_lo, TYPE_LO) \ > > > + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _HI, FLAGS, SELECTOR, > > SOPTAB##_hi, UOPTAB##_hi, TYPE_HI) \ > > > + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _EVEN, FLAGS, SELECTOR, > > SOPTAB##_even, UOPTAB##_even, TYPE_LO) \ > > > + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _ODD, FLAGS, SELECTOR, > > SOPTAB##_odd, UOPTAB##_odd, TYPE_HI) > > > +#endif > > > + > > > #ifndef DEF_INTERNAL_COND_FN > > > #define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE) > > > \ > > > DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##OPTAB, > > cond_##TYPE) \ > > > @@ -608,6 +634,7 @@ DEF_INTERNAL_OPTAB_FN (BIT_ANDN, ECF_CONST, > > andn, binary) > > > DEF_INTERNAL_OPTAB_FN (BIT_IORN, ECF_CONST, iorn, binary) > > > > > > #undef DEF_INTERNAL_WIDENING_OPTAB_FN > > > +#undef DEF_INTERNAL_NARROWING_OPTAB_FN > > > #undef DEF_INTERNAL_SIGNED_COND_FN > > > #undef DEF_INTERNAL_COND_FN > > > #undef DEF_INTERNAL_INT_EXT_FN > > > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h > > > index > > fd21694dfebfb8518810fd85f7aa8c45dd4c362e..8c6ad218e4412716ba7b79b24 > > af708920e11e3be 100644 > > > --- a/gcc/internal-fn.h > > > +++ b/gcc/internal-fn.h > > > @@ -220,6 +220,8 @@ extern int first_commutative_argument (internal_fn); > > > extern bool associative_binary_fn_p (internal_fn); > > > extern bool widening_fn_p (code_helper); > > > extern bool widening_evenodd_fn_p (code_helper); > > > +extern bool narrowing_fn_p (code_helper); > > > +extern bool narrowing_evenodd_fn_p (code_helper); > > > > > > extern bool set_edom_supported_p (void); > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > index > > 675c6e2e683c59df44d5d7d65b87900a70506f50..97b3d4801d19f3168b91c91 > > 271e882bad3f99f13 100644 > > > --- a/gcc/tree-vect-stmts.cc > > > +++ b/gcc/tree-vect-stmts.cc > > > @@ -3157,15 +3157,20 @@ simple_integer_narrowing (tree vectype_out, tree > > vectype_in, > > > || !INTEGRAL_TYPE_P (TREE_TYPE (vectype_in))) > > > return false; > > > > > > - code_helper code; > > > + code_helper code1 = ERROR_MARK, code2 = ERROR_MARK; > > > int multi_step_cvt = 0; > > > auto_vec <tree, 8> interm_types; > > > if (!supportable_narrowing_operation (NOP_EXPR, vectype_out, > > > vectype_in, > > > - &code, &multi_step_cvt, &interm_types) > > > + &code1, &code2, &multi_step_cvt, > > > + &interm_types) > > > || multi_step_cvt) > > > return false; > > > > > > - *convert_code = code; > > > + /* Simple narrowing never have hi/lo splits. */ > > > + if (code2 != ERROR_MARK) > > > + return false; > > > + > > > + *convert_code = code1; > > > return true; > > > } > > > > > > @@ -3375,6 +3380,7 @@ vectorizable_call (vec_info *vinfo, > > > if (cfn != CFN_LAST > > > && (modifier == NONE > > > || (modifier == NARROW > > > + && !narrowing_fn_p (cfn) > > > && simple_integer_narrowing (vectype_out, vectype_in, > > > &convert_code)))) > > > ifn = vectorizable_internal_function (cfn, callee, vectype_out, > > > @@ -3511,7 +3517,7 @@ vectorizable_call (vec_info *vinfo, > > > if (clz_ctz_arg1) > > > ++vect_nargs; > > > > > > - if (modifier == NONE || ifn != IFN_LAST) > > > + if (modifier == NONE || (ifn != IFN_LAST && !narrowing_fn_p (ifn))) > > > { > > > tree prev_res = NULL_TREE; > > > vargs.safe_grow (vect_nargs, true); > > > @@ -5058,7 +5064,8 @@ vectorizable_conversion (vec_info *vinfo, > > > if (!widen_arith > > > && !CONVERT_EXPR_CODE_P (code) > > > && code != FIX_TRUNC_EXPR > > > - && code != FLOAT_EXPR) > > > + && code != FLOAT_EXPR > > > + && !narrowing_fn_p (code)) > > > return false; > > > > > > /* Check types of lhs and rhs. */ > > > @@ -5102,7 +5109,8 @@ vectorizable_conversion (vec_info *vinfo, > > > { > > > gcc_assert (code == WIDEN_MULT_EXPR > > > || code == WIDEN_LSHIFT_EXPR > > > - || widening_fn_p (code)); > > > + || widening_fn_p (code) > > > + || narrowing_fn_p (code)); > > > > > > op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) : > > > gimple_call_arg (stmt, 0); > > > @@ -5285,9 +5293,9 @@ vectorizable_conversion (vec_info *vinfo, > > > break; > > > > > > case NARROW_DST: > > > - gcc_assert (op_type == unary_op); > > > + gcc_assert (op_type == unary_op || op_type == binary_op); > > > if (supportable_narrowing_operation (code, vectype_out, vectype_in, > > > - &code1, &multi_step_cvt, > > > + &code1, &code2, &multi_step_cvt, > > > &interm_types)) > > > break; > > > > > > @@ -5307,7 +5315,7 @@ vectorizable_conversion (vec_info *vinfo, > > > else > > > goto unsupported; > > > if (supportable_narrowing_operation (NOP_EXPR, vectype_out, > > cvt_type, > > > - &code1, &multi_step_cvt, > > > + &code1, &code2, &multi_step_cvt, > > > &interm_types)) > > > break; > > > } > > > @@ -5336,7 +5344,7 @@ vectorizable_conversion (vec_info *vinfo, > > > if (cvt_type == NULL_TREE) > > > goto unsupported; > > > if (!supportable_narrowing_operation (NOP_EXPR, cvt_type, vectype_in, > > > - &code1, &multi_step_cvt, > > > + &code1, &code2, > > &multi_step_cvt, > > > &interm_types)) > > > goto unsupported; > > > if (supportable_convert_operation ((tree_code) code, vectype_out, > > > @@ -5553,11 +5561,44 @@ vectorizable_conversion (vec_info *vinfo, > > > vec_oprnds0[i] = new_temp; > > > } > > > > > > - vect_create_vectorized_demotion_stmts (vinfo, &vec_oprnds0, > > > - multi_step_cvt, > > > - stmt_info, vec_dsts, gsi, > > > - slp_node, code1, > > > - modifier == NARROW_SRC); > > > + if (modifier == NARROW_DST && narrowing_fn_p (code)) > > > + { > > > + gcc_assert (op_type == binary_op); > > > + vect_get_vec_defs (vinfo, slp_node, op0, &vec_oprnds0, > > > + op1, &vec_oprnds1); > > > + tree vop0, vop1; > > > + internal_fn ifn1 = as_internal_fn ((combined_fn)code1); > > > + internal_fn ifn2 = as_internal_fn ((combined_fn)code2); > > > + tree small_type > > > + = get_related_vectype_for_scalar_type (TYPE_MODE (vectype_out), > > > + TREE_TYPE (vectype_out), > > > + exact_div > > (TYPE_VECTOR_SUBPARTS (vectype_out), 2)); > > > + for (unsigned i = 0; i < vec_oprnds0.length (); i += 2) > > > + { > > > + vop0 = vec_oprnds0[i]; > > > + vop1 = vec_oprnds1[i]; > > > + gimple *new_stmt > > > + = gimple_build_call_internal (ifn1, 2, vop0, vop1); > > > + tree new_tmp = make_ssa_name (small_type); > > > + gimple_call_set_lhs (new_stmt, new_tmp); > > > + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); > > > + > > > + vop0 = vec_oprnds0[i + 1]; > > > + vop1 = vec_oprnds1[i + 1]; > > > + new_stmt > > > + = gimple_build_call_internal (ifn2, 3, vop0, vop1, new_tmp); > > > + new_tmp = make_ssa_name (vec_dest, new_stmt); > > > + gimple_call_set_lhs (new_stmt, new_tmp); > > > + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); > > > + slp_node->push_vec_def (new_stmt); > > > + } > > > + } > > > + else > > > + vect_create_vectorized_demotion_stmts (vinfo, &vec_oprnds0, > > > + multi_step_cvt, > > > + stmt_info, vec_dsts, gsi, > > > + slp_node, code1, > > > + modifier == NARROW_SRC); > > > /* After demoting op0 to cvt_type, convert it to dest. */ > > > if (cvt_type && code == FLOAT_EXPR) > > > { > > > @@ -13616,6 +13657,8 @@ supportable_widening_operation (vec_info *vinfo, > > > Output: > > > - CODE1 is the code of a vector operation to be used when > > > vectorizing the operation, if available. > > > + - CODE2 is the code of a vector operation for the high part to be > > > used when > > > + vectorizing the operation, if available. > > > - MULTI_STEP_CVT determines the number of required intermediate steps > > > in > > > case of multi-step conversion (like int->short->char - in that case > > > MULTI_STEP_CVT will be 1). > > > @@ -13625,64 +13668,117 @@ supportable_widening_operation (vec_info > > *vinfo, > > > bool > > > supportable_narrowing_operation (code_helper code, > > > tree vectype_out, tree vectype_in, > > > - code_helper *code1, int *multi_step_cvt, > > > - vec<tree> *interm_types) > > > + code_helper *code1, code_helper *code2, > > > + int *multi_step_cvt, vec<tree> *interm_types) > > > { > > > machine_mode vec_mode; > > > - enum insn_code icode1; > > > - optab optab1, interm_optab; > > > + enum insn_code icode1 = CODE_FOR_nothing, icode2 = CODE_FOR_nothing; > > > + optab optab1 = unknown_optab, optab2 = unknown_optab, interm_optab; > > > tree vectype = vectype_in; > > > tree narrow_vectype = vectype_out; > > > - enum tree_code c1; > > > + code_helper c1 = ERROR_MARK; > > > tree intermediate_type, prev_type; > > > machine_mode intermediate_mode, prev_mode; > > > int i; > > > unsigned HOST_WIDE_INT n_elts; > > > bool uns; > > > > > > - if (!code.is_tree_code ()) > > > - return false; > > > - > > > + vec_mode = TYPE_MODE (vectype); > > > *multi_step_cvt = 0; > > > - switch ((tree_code) code) > > > + if (narrowing_fn_p (code)) > > > + { > > > + /* If this is an internal fn then we must check whether the target > > > + supports the narrowing in one go. */ > > > + internal_fn ifn = as_internal_fn ((combined_fn) code); > > > + > > > + internal_fn lo, hi, even, odd; > > > + lookup_hilo_internal_fn (ifn, &lo, &hi); > > > + if (BYTES_BIG_ENDIAN) > > > + std::swap (lo, hi); > > > + *code1 = as_combined_fn (lo); > > > + *code2 = as_combined_fn (hi); > > > + optab1 = direct_internal_fn_optab (lo, {vectype, vectype}); > > > + optab2 = direct_internal_fn_optab (hi, {vectype, vectype}); > > > + > > > + /* If we don't support low-high, then check for even-odd. */ > > > + if (!optab1 > > > + || (icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing > > > + || !optab2 > > > + || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing) > > > + { > > > + lookup_evenodd_internal_fn (ifn, &even, &odd); > > > + *code1 = as_combined_fn (even); > > > + *code2 = as_combined_fn (odd); > > > + optab1 = direct_internal_fn_optab (even, {vectype, vectype}); > > > + optab2 = direct_internal_fn_optab (odd, {vectype, vectype}); > > > + } > > > + } > > > + else if (code.is_tree_code ()) > > > { > > > - CASE_CONVERT: > > > - c1 = VEC_PACK_TRUNC_EXPR; > > > - if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype) > > > - && VECTOR_BOOLEAN_TYPE_P (vectype) > > > - && SCALAR_INT_MODE_P (TYPE_MODE (vectype)) > > > - && TYPE_VECTOR_SUBPARTS (vectype).is_constant (&n_elts) > > > - && n_elts < BITS_PER_UNIT) > > > - optab1 = vec_pack_sbool_trunc_optab; > > > - else > > > - optab1 = optab_for_tree_code (c1, vectype, optab_default); > > > - break; > > > + switch ((tree_code) code) > > > + { > > > + CASE_CONVERT: > > > + c1 = VEC_PACK_TRUNC_EXPR; > > > + if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype) > > > + && VECTOR_BOOLEAN_TYPE_P (vectype) > > > + && SCALAR_INT_MODE_P (TYPE_MODE (vectype)) > > > + && TYPE_VECTOR_SUBPARTS (vectype).is_constant (&n_elts) > > > + && n_elts < BITS_PER_UNIT) > > > + optab1 = vec_pack_sbool_trunc_optab; > > > + else > > > + optab1 = optab_for_tree_code ((tree_code)c1, vectype, > > > + optab_default); > > > + break; > > > > > > - case FIX_TRUNC_EXPR: > > > - c1 = VEC_PACK_FIX_TRUNC_EXPR; > > > - /* The signedness is determined from output operand. */ > > > - optab1 = optab_for_tree_code (c1, vectype_out, optab_default); > > > - break; > > > + case FIX_TRUNC_EXPR: > > > + c1 = VEC_PACK_FIX_TRUNC_EXPR; > > > + /* The signedness is determined from output operand. */ > > > + optab1 = optab_for_tree_code ((tree_code)c1, vectype_out, > > > + optab_default); > > > + break; > > > > > > - case FLOAT_EXPR: > > > - c1 = VEC_PACK_FLOAT_EXPR; > > > - optab1 = optab_for_tree_code (c1, vectype, optab_default); > > > - break; > > > + case FLOAT_EXPR: > > > + c1 = VEC_PACK_FLOAT_EXPR; > > > + optab1 = optab_for_tree_code ((tree_code)c1, vectype_out, > > > + optab_default); > > > + break; > > > > > > - default: > > > - gcc_unreachable (); > > > + default: > > > + gcc_unreachable (); > > > + } > > > } > > > + else > > > + return false; > > > > > > if (!optab1) > > > return false; > > > > > > - vec_mode = TYPE_MODE (vectype); > > > - if ((icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing) > > > - return false; > > > + if (narrowing_fn_p (code)) > > > + { > > > + if (!optab2) > > > + return false; > > > + if ((icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing > > > + || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing) > > > + return false; > > > + } > > > + else > > > + { > > > + if ((icode1 = optab_handler (optab1, vec_mode)) == > > > CODE_FOR_nothing) > > > + return false; > > > > > > - *code1 = c1; > > > + *code1 = c1; > > > + } > > > > > > - if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype)) > > > + machine_mode nmode; > > > + machine_mode vmode = TYPE_MODE (narrow_vectype); > > > + scalar_mode emode = GET_MODE_INNER (vmode); > > > + poly_uint64 hnunits; > > > + if (insn_data[icode1].operand[0].mode == vmode > > > + || (narrowing_fn_p (code) > > > + && known_ne (hnunits = exact_div (GET_MODE_NUNITS (vmode), 2U), > > 0U) > > > + && related_vector_mode (vmode, emode, hnunits).exists (&nmode) > > > + && insn_data[icode1].operand[0].mode == nmode > > > + && insn_data[icode2].operand[0].mode == vmode)) > > > { > > > if (!VECTOR_BOOLEAN_TYPE_P (vectype)) > > > return true; > > > @@ -13716,7 +13812,7 @@ supportable_narrowing_operation (code_helper > > code, > > > intermediate_type > > > = lang_hooks.types.type_for_mode (TYPE_MODE (vectype_out), 0); > > > interm_optab > > > - = optab_for_tree_code (c1, intermediate_type, optab_default); > > > + = optab_for_tree_code ((tree_code)c1, intermediate_type, optab_default); > > > if (interm_optab != unknown_optab > > > && (icode2 = optab_handler (optab1, vec_mode)) != CODE_FOR_nothing > > > && insn_data[icode1].operand[0].mode > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > > index > > 3d8a9466982a0c29099e60ed7a84e0f5ed207fa9..026dfb131b4c2808290fdbd0 > > 15b63dab5918c7f2 100644 > > > --- a/gcc/tree-vectorizer.h > > > +++ b/gcc/tree-vectorizer.h > > > @@ -2463,8 +2463,8 @@ extern bool supportable_widening_operation > > (vec_info*, code_helper, > > > code_helper*, code_helper*, > > > int*, vec<tree> *); > > > extern bool supportable_narrowing_operation (code_helper, tree, tree, > > > - code_helper *, int *, > > > - vec<tree> *); > > > + code_helper *, code_helper *, > > > + int *, vec<tree> *); > > > extern bool supportable_indirect_convert_operation (code_helper, > > > tree, tree, > > > vec<std::pair<tree, > > > tree_code> > > > &, > > > > > > > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)