On Tue, Dec 31, 2019 at 05:47:54PM +0100, Richard Biener wrote: > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Ok.
Thanks. > >One thing I haven't done anything about yet is that there is > >FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized > >".POPCOUNT" 1 > >before/after this patch with -m32/-march=skylake-avx512. That is > >because > >the popcountll effective target tests that we don't emit a call for > >__builtin_popcountll, which we don't on ia32 skylake-avx512, but > >direct_internal_fn_supported_p isn't true - that is because we expand > >the > >double word popcount using 2 word popcounts + addition. Shall the > >match.pd > >case handle that case too by allowing the optimization even if there > >is a > >type with half precision for which direct_internal_fn_supported_p? > > You mean emitting a single builtin call > Or an add of two ifns? I meant to do in the match.pd condition what expand_unop will do, i.e. - && direct_internal_fn_supported_p (IFN_POPCOUNT, type, - OPTIMIZE_FOR_BOTH)) + && (direct_internal_fn_supported_p (IFN_POPCOUNT, type, + OPTIMIZE_FOR_BOTH) + /* expand_unop can handle double-word popcount using + two word popcounts and addition. */ + || (TREE_CODE (type) == INTEGRAL_TYPE + && TYPE_PRECISION (type) == 2 * BITS_PER_WORD + && (optab_handler (popcount_optab, word_mode) + != CODE_FOR_nothing)))) or so. Jakub