On Tue, Aug 20, 2024 at 9:46 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Andrew Pinski <quic_apin...@quicinc.com> writes:
> > On aarch64 (without !CSSC instructions), since popcount is implemented 
> > using the SIMD instruction cnt,
> > instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit 
> > cnt (V16QI mode). And only one
> > reduction addition instead of 2. Currently fold_builtin_bit_query will 
> > expand always without checking
> > if there was an optab for the type, so this changes that to check the optab 
> > to see if we should expand
> > or have the backend handle it.
> >
> > Bootstrapped and tested on x86_64-linux-gnu and built and tested for 
> > aarch64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> >       * builtins.cc (fold_builtin_bit_query): Don't expand double
> >       `unsigned long long` typess if there is an optab entry for that
> >       type.
>
> OK.  The logic in the function seems a bit twisty (the same condition
> is checked later), but all my attempts to improve it only made it worse.

I tried to look if there was a good refactoring here too but I didn't
see any either.
Anyways I have now pushed it as
r15-3056-g50b5000a5e430aaf99a5e00465cc9e25563d908b .

Thanks,
Andrew

>
> Thanks,
> Richard
>
> >
> > Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com>
> > ---
> >  gcc/builtins.cc | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index 0b902896ddd..b4d51eaeba5 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum 
> > built_in_function fcode,
> >    tree call = NULL_TREE, tem;
> >    if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
> >        && (TYPE_PRECISION (arg0_type)
> > -       == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
> > +       == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
> > +      /* If the target supports the optab, then don't do the expansion. */
> > +      && !direct_internal_fn_supported_p (ifn, arg0_type, 
> > OPTIMIZE_FOR_BOTH))
> >      {
> >        /* __int128 expansions using up to 2 long long builtins.  */
> >        arg0 = save_expr (arg0);

Reply via email to