Andrew Pinski <quic_apin...@quicinc.com> writes:
> On aarch64 (without !CSSC instructions), since popcount is implemented using 
> the SIMD instruction cnt,
> instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt 
> (V16QI mode). And only one
> reduction addition instead of 2. Currently fold_builtin_bit_query will expand 
> always without checking
> if there was an optab for the type, so this changes that to check the optab 
> to see if we should expand
> or have the backend handle it.
>
> Bootstrapped and tested on x86_64-linux-gnu and built and tested for 
> aarch64-linux-gnu.
>
> gcc/ChangeLog:
>
>       * builtins.cc (fold_builtin_bit_query): Don't expand double
>       `unsigned long long` typess if there is an optab entry for that
>       type.

OK.  The logic in the function seems a bit twisty (the same condition
is checked later), but all my attempts to improve it only made it worse.

Thanks,
Richard

>
> Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com>
> ---
>  gcc/builtins.cc | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 0b902896ddd..b4d51eaeba5 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum 
> built_in_function fcode,
>    tree call = NULL_TREE, tem;
>    if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
>        && (TYPE_PRECISION (arg0_type)
> -       == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
> +       == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
> +      /* If the target supports the optab, then don't do the expansion. */
> +      && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH))
>      {
>        /* __int128 expansions using up to 2 long long builtins.  */
>        arg0 = save_expr (arg0);

Reply via email to