RE: [PATCH 1/5]middle-end: Add scaffolding to support narrowing IFNs

Tamar Christina Wed, 20 Aug 2025 09:16:18 -0700

> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Wednesday, August 20, 2025 1:48 PM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; rdsandif...@googlemail.com; nd <n...@arm.com>
> Subject: Re: [PATCH 1/5]middle-end: Add scaffolding to support narrowing IFNs
> 
> On Tue, 19 Aug 2025, Tamar Christina wrote:
> 
> > This adds scaffolding for supporting narrowing IFNs inside the vectorizer 
> > in a
> > similar way as how widening is supported.  However because narrowing
> operations
> > always have the same number of elements as the input and output we need to
> be
> > able to combine the results.  One way this could have been done is by using 
> > a
> > vec_perm_expr but this then can become tricky to recognize as low/hi pairs 
> > in
> > backends.
> >
> > As such I've chosen the design where the _hi and _odd variants of the
> > instructions must always be RMW.  This simplifies the implementation and
> targets
> > that don't want this can use the direct conversion variant.
> 
> the canonial way for "narrowing" would be to have a
> 
> vec_pack_saddh_optab
> 
> that takes two input vectors for each operand (we currently have such
> for conversions, aka the single operand case).  There's no hi/lo
> involved, that's only for widening as we can't have two outputs.
> 
> So - no, we don't want this new odd way of doing.  Either only go with
> vec_saddh_narrow, aka the result mode is of half size, if that suits
> you, or please add the first "pack" variant of a binary operation.
> 
> "pack" would imply narrow here.  Alternatively vec_pack_narrow_saddh
> and vec_narrow_saddh as the two variants.
> 
> Note that for composition I'd use a CTOR.  Note that in your scheme
> the even/odd variant would interleave one result into the other?
> Would the binary optab then fill only every 2nd output lane?  The
> documentation in 2/n isn't exactly clear here.


We've discussed a lot of this on IRC and I believe a lot of this is a 
misunderstanding
and related to how the current optabs are documented.  I have, as our current
documentation put the detailed documentation with the IFN rather than the optab.

This was following the convention seemingly established by the widening 
variants.

But to come back to what we ended up with on IRC.

I proposed

FOO -> {V4SI, V4SI} -> {V4HI} and FOO_LO -> {V4SI, V4SI} -> V4HI, FOO_MERGE_HI 
-> {V4SI, V4SI, V4HI} -> V8HI

And you proposed back

FOO_LO -> {V4SI, V4SI} -> V8HI as well
FOO_EVEN -> {V4SI, V4SI} -> V8HI

Because the x86 variant of these instructions return registers of the same size 
as the inputs, however the
documentation of these instructions [1] state that for the AVX variants the 
upper half Is zero'd, and for SSE the
upper half is undefined.

This to me seems like it means that x86 does not have _HI/_LO variant of these 
instructions and that we're making
a change to accommodate an ISA that can't support these instructions.  I 
believe x86 only has FOO.

And for FOO I don't think we should return V8HI because on SSE the top bits are 
undefined.   You proposed I introduce
an explicit zeroing first, but SSE and AVX are not consistent here.  I believe 
this should return V4HI because it is the only
bits that *all* ISAs specify what the bits should be, and won't have an issue 
with endianness.

We discussed EVEN and ODD as well.  I think again there is a fundamental issue 
with EVEN/ODD, one that wasn't encountered
because, well. Nothing uses the code. EVEN/ODD unlike HI/LO cannot be detected 
as a pattern.  Because the lanes to permute
around just may not be there *unless* you unroll. And patterns can't force an 
unroll since unroll factors are determined after
all pattern matching.

That means they can only be detected later on.  This means EVEN/ODD detection 
cannot, fundamentally rely, or relate to the
generic FOO.  This patch does not ascribe any definition or implementation to 
EVEN/ODD aside that ODD must be RMW.

I don't think this can be described any differently, because widening *reads* 
and narrowing *writes*.  So FOO_EVEN -> {V4SI, V4SI} -> V8HI
Is already true in this patch, because again, it's not used, so the modes can 
be anything.  I only added it because for some reason EVEN/ODD
was required before even though there isn't an implementation for it. I didn't 
add it, I'm just following that convention.

So Again I propose

FOO -> {V4SI, V4SI} -> {V4HI} and FOO_LO -> {V4SI, V4SI} -> V4HI, FOO_MERGE_HI 
-> {V4SI, V4SI, V4HI} -> V8HI

Because FOO_LO -> {V4SI, V4SI} -> V8HI is not a native operation for AArch64, 
x86 maps naturally to FOO and it doesn't have endianness
issues, so I don't see a good reason to complicate the implementation for the 
target that can actually support it.

Thanks,
Tamar

[1] https://www.felixcloutier.com/x86/addpd
> 
> Thanks,
> Richard.
> 
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >     * internal-fn.cc (lookup_hilo_internal_fn,
> >     DEF_INTERNAL_NARROWING_OPTAB_FN, lookup_evenodd_internal_fn,
> >     narrowing_fn_p, narrowing_evenodd_fn_p): New.
> >     * internal-fn.def (DEF_INTERNAL_NARROWING_OPTAB_FN): New.
> >     * internal-fn.h (narrowing_fn_p, narrowing_evenodd_fn_p): New.
> >     * tree-vect-stmts.cc (simple_integer_narrowing, vectorizable_call,
> >     vectorizable_conversion, supportable_widening_operation,
> >     supportable_narrowing_operation): Use it.
> >     * tree-vectorizer.h (supportable_narrowing_operation): Modify
> >     signature.
> >
> > ---
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index
> bf2fac8180706ec418de7eb97cd1260f1d078c03..83438dd2ff57474cec999adae
> abe92c0540e2a51 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -101,7 +101,7 @@ lookup_internal_fn (const char *name)
> >  extern void
> >  lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi)
> >  {
> > -  gcc_assert (widening_fn_p (ifn));
> > +  gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn));
> >
> >    switch (ifn)
> >      {
> > @@ -113,6 +113,11 @@ lookup_hilo_internal_fn (internal_fn ifn, internal_fn
> *lo, internal_fn *hi)
> >        *lo = internal_fn (IFN_##NAME##_LO);                 \
> >        *hi = internal_fn (IFN_##NAME##_HI);                 \
> >        break;
> > +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, S, SO, UO, T1, T2) \
> > +    case IFN_##NAME:                                                   \
> > +      *lo = internal_fn (IFN_##NAME##_LO);                     \
> > +      *hi = internal_fn (IFN_##NAME##_HI);                     \
> > +      break;
> >  #include "internal-fn.def"
> >      }
> >  }
> > @@ -124,7 +129,7 @@ extern void
> >  lookup_evenodd_internal_fn (internal_fn ifn, internal_fn *even,
> >                         internal_fn *odd)
> >  {
> > -  gcc_assert (widening_fn_p (ifn));
> > +  gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn));
> >
> >    switch (ifn)
> >      {
> > @@ -136,6 +141,11 @@ lookup_evenodd_internal_fn (internal_fn ifn,
> internal_fn *even,
> >        *even = internal_fn (IFN_##NAME##_EVEN);                     \
> >        *odd = internal_fn (IFN_##NAME##_ODD);                       \
> >        break;
> > +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, S, SO, UO, T1, T2) \
> > +    case IFN_##NAME:                                                   \
> > +      *even = internal_fn (IFN_##NAME##_EVEN);                         \
> > +      *odd = internal_fn (IFN_##NAME##_ODD);                           \
> > +      break;
> >  #include "internal-fn.def"
> >      }
> >  }
> > @@ -4548,6 +4558,35 @@ widening_fn_p (code_helper code)
> >      }
> >  }
> >
> > +/* Return true if this CODE describes an internal_fn that returns a vector 
> > with
> > +   elements twice as narrow as the element size of the input vectors.  */
> > +
> > +bool
> > +narrowing_fn_p (code_helper code)
> > +{
> > +  if (!code.is_fn_code ())
> > +    return false;
> > +
> > +  if (!internal_fn_p ((combined_fn) code))
> > +    return false;
> > +
> > +  internal_fn fn = as_internal_fn ((combined_fn) code);
> > +  switch (fn)
> > +    {
> > +    #define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, S, SO, UO, T1,
> T2) \
> > +    case IFN_##NAME:                                                       
> > \
> > +    case IFN_##NAME##_HI:                                          \
> > +    case IFN_##NAME##_LO:                                          \
> > +    case IFN_##NAME##_EVEN:                                                
> > \
> > +    case IFN_##NAME##_ODD:                                         \
> > +      return true;
> > +    #include "internal-fn.def"
> > +
> > +    default:
> > +      return false;
> > +    }
> > +}
> > +
> >  /* Return true if this CODE describes an internal_fn that returns a vector 
> > with
> >     elements twice as wide as the element size of the input vectors and 
> > operates
> >     on even/odd parts of the input.  */
> > @@ -4575,6 +4614,33 @@ widening_evenodd_fn_p (code_helper code)
> >      }
> >  }
> >
> > +/* Return true if this CODE describes an internal_fn that returns a vector 
> > with
> > +   elements twice as narrow as the element size of the input vectors and
> > +   operates on even/odd parts of the input.  */
> > +
> > +bool
> > +narrowing_evenodd_fn_p (code_helper code)
> > +{
> > +  if (!code.is_fn_code ())
> > +    return false;
> > +
> > +  if (!internal_fn_p ((combined_fn) code))
> > +    return false;
> > +
> > +  internal_fn fn = as_internal_fn ((combined_fn) code);
> > +  switch (fn)
> > +    {
> > +    #define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, S, SO, UO, T1,
> T2) \
> > +    case IFN_##NAME##_EVEN:                                                
> > \
> > +    case IFN_##NAME##_ODD:                                         \
> > +      return true;
> > +    #include "internal-fn.def"
> > +
> > +    default:
> > +      return false;
> > +    }
> > +}
> > +
> >  /* Return true if IFN_SET_EDOM is supported.  */
> >
> >  bool
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index
> d2480a1bf7927476215bc7bb99c0b74197d2b7e9..69677dd10b980c83dec364
> 87b1214ff066f4789b 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -40,6 +40,8 @@ along with GCC; see the file COPYING3.  If not see
> >       DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE)
> >       DEF_INTERNAL_WIDENING_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB,
> UOPTAB,
> >                                  TYPE)
> > +     DEF_INTERNAL_NARROWING_OPTAB_FN (NAME, FLAGS, SELECTOR,
> SOPTAB, UOPTAB,
> > +                                TYPE_LO, TYPE_HI)
> >
> >     where NAME is the name of the function, FLAGS is a set of
> >     ECF_* flags and FNSPEC is a string describing functions fnspec.
> > @@ -122,6 +124,21 @@ along with GCC; see the file COPYING3.  If not see
> >     These five internal functions will require two optabs each, a 
> > SIGNED_OPTAB
> >     and an UNSIGNED_OTPAB.
> >
> > +   DEF_INTERNAL_NARROWING_OPTAB_FN is a wrapper that defines five
> internal
> > +   functions with DEF_INTERNAL_SIGNED_OPTAB_FN:
> > +   - one that describes a narrowing operation with the same number of 
> > elements
> > +   in the output and input vectors,
> > +   - two that describe a pair of high-low narrowing operations where the 
> > output
> > +   vectors each have half the number of elements of the input vectors,
> > +   corresponding to the result of the narrowing operation on the top half 
> > and
> > +   bottom half, these have the suffixes _HI and _LO,
> > +   - and two that describe a pair of even-odd narrowing operations where 
> > the
> > +   output vectors each have half the number of elements of the input 
> > vectors,
> > +   corresponding to the result of the narrowing operation on the even and 
> > odd
> > +   elements, these have the suffixes _EVEN and _ODD.
> > +   These five internal functions will require two optabs each, a 
> > SIGNED_OPTAB
> > +   and an UNSIGNED_OTPAB.
> > +
> >     DEF_INTERNAL_COND_FN is a wrapper that defines 2 internal functions with
> >     DEF_INTERNAL_OPTAB_FN:
> >     - One is COND_* operations that are predicated by mask only. Such 
> > operations
> > @@ -184,6 +201,15 @@ along with GCC; see the file COPYING3.  If not see
> >    DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _ODD, FLAGS, SELECTOR,
> SOPTAB##_odd, UOPTAB##_odd, TYPE)
> >  #endif
> >
> > +#ifndef DEF_INTERNAL_NARROWING_OPTAB_FN
> > +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, FLAGS, SELECTOR,
> SOPTAB, UOPTAB, TYPE_LO, TYPE_HI)       \
> > +  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB,
> UOPTAB, TYPE_LO)                             \
> > +  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _LO, FLAGS, SELECTOR,
> SOPTAB##_lo, UOPTAB##_lo, TYPE_LO)       \
> > +  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _HI, FLAGS, SELECTOR,
> SOPTAB##_hi, UOPTAB##_hi, TYPE_HI)       \
> > +  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _EVEN, FLAGS, SELECTOR,
> SOPTAB##_even, UOPTAB##_even, TYPE_LO) \
> > +  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _ODD, FLAGS, SELECTOR,
> SOPTAB##_odd, UOPTAB##_odd, TYPE_HI)
> > +#endif
> > +
> >  #ifndef DEF_INTERNAL_COND_FN
> >  #define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE)                     
> >     \
> >    DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##OPTAB,
> cond_##TYPE)        \
> > @@ -608,6 +634,7 @@ DEF_INTERNAL_OPTAB_FN (BIT_ANDN, ECF_CONST,
> andn, binary)
> >  DEF_INTERNAL_OPTAB_FN (BIT_IORN, ECF_CONST, iorn, binary)
> >
> >  #undef DEF_INTERNAL_WIDENING_OPTAB_FN
> > +#undef DEF_INTERNAL_NARROWING_OPTAB_FN
> >  #undef DEF_INTERNAL_SIGNED_COND_FN
> >  #undef DEF_INTERNAL_COND_FN
> >  #undef DEF_INTERNAL_INT_EXT_FN
> > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> > index
> fd21694dfebfb8518810fd85f7aa8c45dd4c362e..8c6ad218e4412716ba7b79b24
> af708920e11e3be 100644
> > --- a/gcc/internal-fn.h
> > +++ b/gcc/internal-fn.h
> > @@ -220,6 +220,8 @@ extern int first_commutative_argument (internal_fn);
> >  extern bool associative_binary_fn_p (internal_fn);
> >  extern bool widening_fn_p (code_helper);
> >  extern bool widening_evenodd_fn_p (code_helper);
> > +extern bool narrowing_fn_p (code_helper);
> > +extern bool narrowing_evenodd_fn_p (code_helper);
> >
> >  extern bool set_edom_supported_p (void);
> >
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 675c6e2e683c59df44d5d7d65b87900a70506f50..97b3d4801d19f3168b91c91
> 271e882bad3f99f13 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -3157,15 +3157,20 @@ simple_integer_narrowing (tree vectype_out, tree
> vectype_in,
> >        || !INTEGRAL_TYPE_P (TREE_TYPE (vectype_in)))
> >      return false;
> >
> > -  code_helper code;
> > +  code_helper code1 = ERROR_MARK, code2 = ERROR_MARK;
> >    int multi_step_cvt = 0;
> >    auto_vec <tree, 8> interm_types;
> >    if (!supportable_narrowing_operation (NOP_EXPR, vectype_out, vectype_in,
> > -                                   &code, &multi_step_cvt, &interm_types)
> > +                                   &code1, &code2, &multi_step_cvt,
> > +                                   &interm_types)
> >        || multi_step_cvt)
> >      return false;
> >
> > -  *convert_code = code;
> > +  /* Simple narrowing never have hi/lo splits.  */
> > +  if (code2 != ERROR_MARK)
> > +    return false;
> > +
> > +  *convert_code = code1;
> >    return true;
> >  }
> >
> > @@ -3375,6 +3380,7 @@ vectorizable_call (vec_info *vinfo,
> >    if (cfn != CFN_LAST
> >        && (modifier == NONE
> >       || (modifier == NARROW
> > +         && !narrowing_fn_p (cfn)
> >           && simple_integer_narrowing (vectype_out, vectype_in,
> >                                        &convert_code))))
> >      ifn = vectorizable_internal_function (cfn, callee, vectype_out,
> > @@ -3511,7 +3517,7 @@ vectorizable_call (vec_info *vinfo,
> >    if (clz_ctz_arg1)
> >      ++vect_nargs;
> >
> > -  if (modifier == NONE || ifn != IFN_LAST)
> > +  if (modifier == NONE || (ifn != IFN_LAST && !narrowing_fn_p (ifn)))
> >      {
> >        tree prev_res = NULL_TREE;
> >        vargs.safe_grow (vect_nargs, true);
> > @@ -5058,7 +5064,8 @@ vectorizable_conversion (vec_info *vinfo,
> >    if (!widen_arith
> >        && !CONVERT_EXPR_CODE_P (code)
> >        && code != FIX_TRUNC_EXPR
> > -      && code != FLOAT_EXPR)
> > +      && code != FLOAT_EXPR
> > +      && !narrowing_fn_p (code))
> >      return false;
> >
> >    /* Check types of lhs and rhs.  */
> > @@ -5102,7 +5109,8 @@ vectorizable_conversion (vec_info *vinfo,
> >      {
> >        gcc_assert (code == WIDEN_MULT_EXPR
> >               || code == WIDEN_LSHIFT_EXPR
> > -             || widening_fn_p (code));
> > +             || widening_fn_p (code)
> > +             || narrowing_fn_p (code));
> >
> >        op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) :
> >                                  gimple_call_arg (stmt, 0);
> > @@ -5285,9 +5293,9 @@ vectorizable_conversion (vec_info *vinfo,
> >        break;
> >
> >      case NARROW_DST:
> > -      gcc_assert (op_type == unary_op);
> > +      gcc_assert (op_type == unary_op || op_type == binary_op);
> >        if (supportable_narrowing_operation (code, vectype_out, vectype_in,
> > -                                      &code1, &multi_step_cvt,
> > +                                      &code1, &code2, &multi_step_cvt,
> >                                        &interm_types))
> >     break;
> >
> > @@ -5307,7 +5315,7 @@ vectorizable_conversion (vec_info *vinfo,
> >       else
> >         goto unsupported;
> >       if (supportable_narrowing_operation (NOP_EXPR, vectype_out,
> cvt_type,
> > -                                          &code1, &multi_step_cvt,
> > +                                          &code1, &code2, &multi_step_cvt,
> >                                            &interm_types))
> >         break;
> >     }
> > @@ -5336,7 +5344,7 @@ vectorizable_conversion (vec_info *vinfo,
> >       if (cvt_type == NULL_TREE)
> >         goto unsupported;
> >       if (!supportable_narrowing_operation (NOP_EXPR, cvt_type, vectype_in,
> > -                                           &code1, &multi_step_cvt,
> > +                                           &code1, &code2,
> &multi_step_cvt,
> >                                             &interm_types))
> >         goto unsupported;
> >       if (supportable_convert_operation ((tree_code) code, vectype_out,
> > @@ -5553,11 +5561,44 @@ vectorizable_conversion (vec_info *vinfo,
> >         vec_oprnds0[i] = new_temp;
> >       }
> >
> > -      vect_create_vectorized_demotion_stmts (vinfo, &vec_oprnds0,
> > -                                        multi_step_cvt,
> > -                                        stmt_info, vec_dsts, gsi,
> > -                                        slp_node, code1,
> > -                                        modifier == NARROW_SRC);
> > +      if (modifier == NARROW_DST && narrowing_fn_p (code))
> > +   {
> > +     gcc_assert (op_type == binary_op);
> > +     vect_get_vec_defs (vinfo, slp_node, op0, &vec_oprnds0,
> > +                        op1, &vec_oprnds1);
> > +     tree vop0, vop1;
> > +     internal_fn ifn1 = as_internal_fn ((combined_fn)code1);
> > +     internal_fn ifn2 = as_internal_fn ((combined_fn)code2);
> > +     tree small_type
> > +       = get_related_vectype_for_scalar_type (TYPE_MODE (vectype_out),
> > +                                              TREE_TYPE (vectype_out),
> > +                                              exact_div
> (TYPE_VECTOR_SUBPARTS (vectype_out), 2));
> > +     for (unsigned i = 0; i < vec_oprnds0.length (); i += 2)
> > +       {
> > +         vop0 = vec_oprnds0[i];
> > +         vop1 = vec_oprnds1[i];
> > +         gimple *new_stmt
> > +           = gimple_build_call_internal (ifn1, 2, vop0, vop1);
> > +         tree new_tmp = make_ssa_name (small_type);
> > +         gimple_call_set_lhs (new_stmt, new_tmp);
> > +         vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> > +
> > +         vop0 = vec_oprnds0[i + 1];
> > +         vop1 = vec_oprnds1[i + 1];
> > +         new_stmt
> > +           = gimple_build_call_internal (ifn2, 3, vop0, vop1, new_tmp);
> > +         new_tmp = make_ssa_name (vec_dest, new_stmt);
> > +         gimple_call_set_lhs (new_stmt, new_tmp);
> > +         vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> > +         slp_node->push_vec_def (new_stmt);
> > +       }
> > +   }
> > +      else
> > +        vect_create_vectorized_demotion_stmts (vinfo, &vec_oprnds0,
> > +                                          multi_step_cvt,
> > +                                          stmt_info, vec_dsts, gsi,
> > +                                          slp_node, code1,
> > +                                          modifier == NARROW_SRC);
> >        /* After demoting op0 to cvt_type, convert it to dest.  */
> >        if (cvt_type && code == FLOAT_EXPR)
> >     {
> > @@ -13616,6 +13657,8 @@ supportable_widening_operation (vec_info *vinfo,
> >     Output:
> >     - CODE1 is the code of a vector operation to be used when
> >     vectorizing the operation, if available.
> > +   - CODE2 is the code of a vector operation for the high part to be used 
> > when
> > +   vectorizing the operation, if available.
> >     - MULTI_STEP_CVT determines the number of required intermediate steps in
> >     case of multi-step conversion (like int->short->char - in that case
> >     MULTI_STEP_CVT will be 1).
> > @@ -13625,64 +13668,117 @@ supportable_widening_operation (vec_info
> *vinfo,
> >  bool
> >  supportable_narrowing_operation (code_helper code,
> >                              tree vectype_out, tree vectype_in,
> > -                            code_helper *code1, int *multi_step_cvt,
> > -                                 vec<tree> *interm_types)
> > +                            code_helper *code1, code_helper *code2,
> > +                            int *multi_step_cvt, vec<tree> *interm_types)
> >  {
> >    machine_mode vec_mode;
> > -  enum insn_code icode1;
> > -  optab optab1, interm_optab;
> > +  enum insn_code icode1 = CODE_FOR_nothing, icode2 = CODE_FOR_nothing;
> > +  optab optab1 = unknown_optab, optab2 = unknown_optab, interm_optab;
> >    tree vectype = vectype_in;
> >    tree narrow_vectype = vectype_out;
> > -  enum tree_code c1;
> > +  code_helper c1 = ERROR_MARK;
> >    tree intermediate_type, prev_type;
> >    machine_mode intermediate_mode, prev_mode;
> >    int i;
> >    unsigned HOST_WIDE_INT n_elts;
> >    bool uns;
> >
> > -  if (!code.is_tree_code ())
> > -    return false;
> > -
> > +  vec_mode = TYPE_MODE (vectype);
> >    *multi_step_cvt = 0;
> > -  switch ((tree_code) code)
> > +  if (narrowing_fn_p (code))
> > +     {
> > +       /* If this is an internal fn then we must check whether the target
> > +     supports the narrowing in one go.  */
> > +      internal_fn ifn = as_internal_fn ((combined_fn) code);
> > +
> > +      internal_fn lo, hi, even, odd;
> > +      lookup_hilo_internal_fn (ifn, &lo, &hi);
> > +      if (BYTES_BIG_ENDIAN)
> > +   std::swap (lo, hi);
> > +      *code1 = as_combined_fn (lo);
> > +      *code2 = as_combined_fn (hi);
> > +      optab1 = direct_internal_fn_optab (lo, {vectype, vectype});
> > +      optab2 = direct_internal_fn_optab (hi, {vectype, vectype});
> > +
> > +      /* If we don't support low-high, then check for even-odd.  */
> > +      if (!optab1
> > +     || (icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing
> > +     || !optab2
> > +     || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing)
> > +   {
> > +     lookup_evenodd_internal_fn (ifn, &even, &odd);
> > +     *code1 = as_combined_fn (even);
> > +     *code2 = as_combined_fn (odd);
> > +     optab1 = direct_internal_fn_optab (even, {vectype, vectype});
> > +     optab2 = direct_internal_fn_optab (odd, {vectype, vectype});
> > +   }
> > +    }
> > +  else if (code.is_tree_code ())
> >      {
> > -    CASE_CONVERT:
> > -      c1 = VEC_PACK_TRUNC_EXPR;
> > -      if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> > -     && VECTOR_BOOLEAN_TYPE_P (vectype)
> > -     && SCALAR_INT_MODE_P (TYPE_MODE (vectype))
> > -     && TYPE_VECTOR_SUBPARTS (vectype).is_constant (&n_elts)
> > -     && n_elts < BITS_PER_UNIT)
> > -   optab1 = vec_pack_sbool_trunc_optab;
> > -      else
> > -   optab1 = optab_for_tree_code (c1, vectype, optab_default);
> > -      break;
> > +      switch ((tree_code) code)
> > +   {
> > +   CASE_CONVERT:
> > +     c1 = VEC_PACK_TRUNC_EXPR;
> > +     if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> > +         && VECTOR_BOOLEAN_TYPE_P (vectype)
> > +         && SCALAR_INT_MODE_P (TYPE_MODE (vectype))
> > +         && TYPE_VECTOR_SUBPARTS (vectype).is_constant (&n_elts)
> > +         && n_elts < BITS_PER_UNIT)
> > +       optab1 = vec_pack_sbool_trunc_optab;
> > +     else
> > +       optab1 = optab_for_tree_code ((tree_code)c1, vectype,
> > +                                     optab_default);
> > +     break;
> >
> > -    case FIX_TRUNC_EXPR:
> > -      c1 = VEC_PACK_FIX_TRUNC_EXPR;
> > -      /* The signedness is determined from output operand.  */
> > -      optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
> > -      break;
> > +   case FIX_TRUNC_EXPR:
> > +     c1 = VEC_PACK_FIX_TRUNC_EXPR;
> > +     /* The signedness is determined from output operand.  */
> > +     optab1 = optab_for_tree_code ((tree_code)c1, vectype_out,
> > +                                   optab_default);
> > +     break;
> >
> > -    case FLOAT_EXPR:
> > -      c1 = VEC_PACK_FLOAT_EXPR;
> > -      optab1 = optab_for_tree_code (c1, vectype, optab_default);
> > -      break;
> > +   case FLOAT_EXPR:
> > +     c1 = VEC_PACK_FLOAT_EXPR;
> > +     optab1 = optab_for_tree_code ((tree_code)c1, vectype_out,
> > +                                   optab_default);
> > +     break;
> >
> > -    default:
> > -      gcc_unreachable ();
> > +   default:
> > +     gcc_unreachable ();
> > +   }
> >      }
> > +  else
> > +    return false;
> >
> >    if (!optab1)
> >      return false;
> >
> > -  vec_mode = TYPE_MODE (vectype);
> > -  if ((icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing)
> > -    return false;
> > +  if (narrowing_fn_p (code))
> > +    {
> > +      if (!optab2)
> > +   return false;
> > +      if ((icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing
> > +     || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing)
> > +   return false;
> > +    }
> > +  else
> > +    {
> > +      if ((icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing)
> > +   return false;
> >
> > -  *code1 = c1;
> > +      *code1 = c1;
> > +    }
> >
> > -  if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype))
> > +  machine_mode nmode;
> > +  machine_mode vmode = TYPE_MODE (narrow_vectype);
> > +  scalar_mode emode = GET_MODE_INNER (vmode);
> > +  poly_uint64 hnunits;
> > +  if (insn_data[icode1].operand[0].mode == vmode
> > +      || (narrowing_fn_p (code)
> > +     && known_ne (hnunits = exact_div (GET_MODE_NUNITS (vmode), 2U),
> 0U)
> > +     && related_vector_mode (vmode, emode, hnunits).exists (&nmode)
> > +     && insn_data[icode1].operand[0].mode == nmode
> > +     && insn_data[icode2].operand[0].mode == vmode))
> >      {
> >        if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> >     return true;
> > @@ -13716,7 +13812,7 @@ supportable_narrowing_operation (code_helper
> code,
> >        intermediate_type
> >     = lang_hooks.types.type_for_mode (TYPE_MODE (vectype_out), 0);
> >        interm_optab
> > -   = optab_for_tree_code (c1, intermediate_type, optab_default);
> > +   = optab_for_tree_code ((tree_code)c1, intermediate_type, optab_default);
> >        if (interm_optab != unknown_optab
> >       && (icode2 = optab_handler (optab1, vec_mode)) != CODE_FOR_nothing
> >       && insn_data[icode1].operand[0].mode
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index
> 3d8a9466982a0c29099e60ed7a84e0f5ed207fa9..026dfb131b4c2808290fdbd0
> 15b63dab5918c7f2 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -2463,8 +2463,8 @@ extern bool supportable_widening_operation
> (vec_info*, code_helper,
> >                                         code_helper*, code_helper*,
> >                                         int*, vec<tree> *);
> >  extern bool supportable_narrowing_operation (code_helper, tree, tree,
> > -                                        code_helper *, int *,
> > -                                        vec<tree> *);
> > +                                        code_helper *, code_helper *,
> > +                                        int *, vec<tree> *);
> >  extern bool supportable_indirect_convert_operation (code_helper,
> >                                                 tree, tree,
> >                                                 vec<std::pair<tree, 
> > tree_code> >
> &,
> >
> >
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH 1/5]middle-end: Add scaffolding to support narrowing IFNs

Reply via email to