On Tue, Dec 31, 2019 at 05:47:54PM +0100, Richard Biener wrote:
> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Ok. 

Thanks.

> >One thing I haven't done anything about yet is that there is
> >FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized
> >".POPCOUNT" 1
> >before/after this patch with -m32/-march=skylake-avx512.  That is
> >because
> >the popcountll effective target tests that we don't emit a call for
> >__builtin_popcountll, which we don't on ia32 skylake-avx512, but
> >direct_internal_fn_supported_p isn't true - that is because we expand
> >the
> >double word popcount using 2 word popcounts + addition.  Shall the
> >match.pd
> >case handle that case too  by allowing the optimization even if there
> >is a
> >type with half precision for which direct_internal_fn_supported_p?
> 
> You mean emitting a single builtin call
> Or an add of two ifns? 

I meant to do in the match.pd condition what expand_unop will do, i.e.
-       && direct_internal_fn_supported_p (IFN_POPCOUNT, type,
-                                          OPTIMIZE_FOR_BOTH))
+       && (direct_internal_fn_supported_p (IFN_POPCOUNT, type,
+                                           OPTIMIZE_FOR_BOTH)
+           /* expand_unop can handle double-word popcount using
+              two word popcounts and addition.  */
+           || (TREE_CODE (type) == INTEGRAL_TYPE
+               && TYPE_PRECISION (type) == 2 * BITS_PER_WORD
+               && (optab_handler (popcount_optab, word_mode)
+                   != CODE_FOR_nothing))))
or so.

        Jakub

Reply via email to