On Tue, 31 Dec 2019, Jakub Jelinek wrote: > On Tue, Dec 31, 2019 at 05:47:54PM +0100, Richard Biener wrote: > > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > Ok. > > Thanks. > > > >One thing I haven't done anything about yet is that there is > > >FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized > > >".POPCOUNT" 1 > > >before/after this patch with -m32/-march=skylake-avx512. That is > > >because > > >the popcountll effective target tests that we don't emit a call for > > >__builtin_popcountll, which we don't on ia32 skylake-avx512, but > > >direct_internal_fn_supported_p isn't true - that is because we expand > > >the > > >double word popcount using 2 word popcounts + addition. Shall the > > >match.pd > > >case handle that case too by allowing the optimization even if there > > >is a > > >type with half precision for which direct_internal_fn_supported_p? > > > > You mean emitting a single builtin call > > Or an add of two ifns? > > I meant to do in the match.pd condition what expand_unop will do, i.e. > - && direct_internal_fn_supported_p (IFN_POPCOUNT, type, > - OPTIMIZE_FOR_BOTH)) > + && (direct_internal_fn_supported_p (IFN_POPCOUNT, type, > + OPTIMIZE_FOR_BOTH) > + /* expand_unop can handle double-word popcount using > + two word popcounts and addition. */ > + || (TREE_CODE (type) == INTEGRAL_TYPE > + && TYPE_PRECISION (type) == 2 * BITS_PER_WORD > + && (optab_handler (popcount_optab, word_mode) > + != CODE_FOR_nothing)))) > or so.
OK, that would work for me (maybe add a predicate to the optabs code close to the actual expander). Richard.