On Tue, Dec 31, 2019 at 05:47:54PM +0100, Richard Biener wrote:
> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Ok.
Thanks.
> >One thing I haven't done anything about yet is that there is
> >FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized
> >".POPCOUNT" 1
> >before/after this patch with -m32/-march=skylake-avx512. That is
> >because
> >the popcountll effective target tests that we don't emit a call for
> >__builtin_popcountll, which we don't on ia32 skylake-avx512, but
> >direct_internal_fn_supported_p isn't true - that is because we expand
> >the
> >double word popcount using 2 word popcounts + addition. Shall the
> >match.pd
> >case handle that case too by allowing the optimization even if there
> >is a
> >type with half precision for which direct_internal_fn_supported_p?
>
> You mean emitting a single builtin call
> Or an add of two ifns?
I meant to do in the match.pd condition what expand_unop will do, i.e.
- && direct_internal_fn_supported_p (IFN_POPCOUNT, type,
- OPTIMIZE_FOR_BOTH))
+ && (direct_internal_fn_supported_p (IFN_POPCOUNT, type,
+ OPTIMIZE_FOR_BOTH)
+ /* expand_unop can handle double-word popcount using
+ two word popcounts and addition. */
+ || (TREE_CODE (type) == INTEGRAL_TYPE
+ && TYPE_PRECISION (type) == 2 * BITS_PER_WORD
+ && (optab_handler (popcount_optab, word_mode)
+ != CODE_FOR_nothing))))
or so.
Jakub