On Mon, Oct 30, 2023 at 1:23 PM <[email protected]> wrote:
>
> From: Pan Li <[email protected]>
>
> Update in v3:
>
> * Add func to predicate type size is legal or not for vectorizer call.
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
> for (unsigned i = 0; i < count; i++)
> out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
>
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
>
> The below test are passed for this patch.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_type_size_legal_p): New
> func impl to predicate the type size is legal or not.
> (vectorizable_call): Leverage vectorizable_type_size_legal_p.
>
> Signed-off-by: Pan Li <[email protected]>
> ---
> gcc/tree-vect-stmts.cc | 51 +++++++++++++++++++++++++++++++-----------
> 1 file changed, 38 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..24b3448d961 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree
> fndecl,
> return IFN_LAST;
> }
>
> +/* Return TRUE when the type size is legal for the call vectorizer,
> + or FALSE.
> + The type size of both the vectype_in and vectype_out should be
> + exactly the same when vectype_out isn't participating the optab.
> + While there is no restriction for type size when vectype_out
> + is part of the optab query.
> + */
> +static bool
> +vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out,
> + tree vectype_in)
> +{
> + bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
> +
> + if (ifn == IFN_LAST || !direct_internal_fn_p (ifn))
> + return same_size_p;
> +
> + const direct_internal_fn_info &difn_info = direct_internal_fn (ifn);
> +
> + if (!difn_info.vectorizable)
> + return same_size_p;
> +
> + /* According to vectorizable_internal_function, the type0/1 < 0 indicates
> + the vectype_out participating the optable selection. Aka the type size
> + check can be skipped here. */
> + if (difn_info.type0 < 0 || difn_info.type1 < 0)
> + return true;
can you instead amend vectorizable_internal_function to contain the check,
returning IFN_LAST if it doesn't hold?
> +
> + return same_size_p;
> +}
>
> static tree permute_vec_elements (vec_info *, tree, tree, tree,
> stmt_vec_info,
> gimple_stmt_iterator *);
> @@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo,
>
> return false;
> }
> - /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz*
> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> - by a pack of the two vectors into an SI vector. We would need
> - separate code to handle direct VnDI->VnSI IFN_CTZs. */
> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> - {
> - if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "mismatched vector sizes %T and %T\n",
> - vectype_in, vectype_out);
> - return false;
> - }
>
> if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
> != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> @@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo,
> ifn = vectorizable_internal_function (cfn, callee, vectype_out,
> vectype_in);
>
> + if (!vectorizable_type_size_legal_p (ifn, vectype_out, vectype_in))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "mismatched vector sizes %T and %T\n",
> + vectype_in, vectype_out);
> + return false;
> + }
> +
> /* If that fails, try asking for a target-specific built-in function. */
> if (ifn == IFN_LAST)
> {
> --
> 2.34.1
>