Andrew Pinski <quic_apin...@quicinc.com> writes: > On aarch64 (without !CSSC instructions), since popcount is implemented using > the SIMD instruction cnt, > instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt > (V16QI mode). And only one > reduction addition instead of 2. Currently fold_builtin_bit_query will expand > always without checking > if there was an optab for the type, so this changes that to check the optab > to see if we should expand > or have the backend handle it. > > Bootstrapped and tested on x86_64-linux-gnu and built and tested for > aarch64-linux-gnu. > > gcc/ChangeLog: > > * builtins.cc (fold_builtin_bit_query): Don't expand double > `unsigned long long` typess if there is an optab entry for that > type.
OK. The logic in the function seems a bit twisty (the same condition is checked later), but all my attempts to improve it only made it worse. Thanks, Richard > > Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com> > --- > gcc/builtins.cc | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/gcc/builtins.cc b/gcc/builtins.cc > index 0b902896ddd..b4d51eaeba5 100644 > --- a/gcc/builtins.cc > +++ b/gcc/builtins.cc > @@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum > built_in_function fcode, > tree call = NULL_TREE, tem; > if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE > && (TYPE_PRECISION (arg0_type) > - == 2 * TYPE_PRECISION (long_long_unsigned_type_node))) > + == 2 * TYPE_PRECISION (long_long_unsigned_type_node)) > + /* If the target supports the optab, then don't do the expansion. */ > + && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH)) > { > /* __int128 expansions using up to 2 long long builtins. */ > arg0 = save_expr (arg0);