On Tue, Aug 20, 2024 at 9:46 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Andrew Pinski <quic_apin...@quicinc.com> writes: > > On aarch64 (without !CSSC instructions), since popcount is implemented > > using the SIMD instruction cnt, > > instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit > > cnt (V16QI mode). And only one > > reduction addition instead of 2. Currently fold_builtin_bit_query will > > expand always without checking > > if there was an optab for the type, so this changes that to check the optab > > to see if we should expand > > or have the backend handle it. > > > > Bootstrapped and tested on x86_64-linux-gnu and built and tested for > > aarch64-linux-gnu. > > > > gcc/ChangeLog: > > > > * builtins.cc (fold_builtin_bit_query): Don't expand double > > `unsigned long long` typess if there is an optab entry for that > > type. > > OK. The logic in the function seems a bit twisty (the same condition > is checked later), but all my attempts to improve it only made it worse.
I tried to look if there was a good refactoring here too but I didn't see any either. Anyways I have now pushed it as r15-3056-g50b5000a5e430aaf99a5e00465cc9e25563d908b . Thanks, Andrew > > Thanks, > Richard > > > > > Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com> > > --- > > gcc/builtins.cc | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/builtins.cc b/gcc/builtins.cc > > index 0b902896ddd..b4d51eaeba5 100644 > > --- a/gcc/builtins.cc > > +++ b/gcc/builtins.cc > > @@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum > > built_in_function fcode, > > tree call = NULL_TREE, tem; > > if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE > > && (TYPE_PRECISION (arg0_type) > > - == 2 * TYPE_PRECISION (long_long_unsigned_type_node))) > > + == 2 * TYPE_PRECISION (long_long_unsigned_type_node)) > > + /* If the target supports the optab, then don't do the expansion. */ > > + && !direct_internal_fn_supported_p (ifn, arg0_type, > > OPTIMIZE_FOR_BOTH)) > > { > > /* __int128 expansions using up to 2 long long builtins. */ > > arg0 = save_expr (arg0);