On Tue, 13 Aug 2019, Wilco Dijkstra wrote:
Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and popcount (x) == 1 into (x-1) <u (x & -x). These trigger only for single-use cases and support an optional convert. A microbenchmark shows a speedup of 2-2.5x on both x64 and AArch64.
Is that true even on targets that have a popcount instruction? (-mpopcnt for x64)
diff --git a/gcc/match.pd b/gcc/match.pd index 0317bc704f771f626ab72189b3a54de00087ad5a..bf4351a330f45f3a1424d9792cefc3da6267597d 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5356,7 +5356,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) rep (eq eq ne ne) (simplify (cmp (popcount @0) integer_zerop) - (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) + (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))) + /* popcount(X) == 1 -> (X-1) <u (X & -X). */ + (for cmp (eq ne) + rep (lt ge) + (simplify + (cmp (convert? (popcount:s @0)) integer_onep) + (with { + tree utype = unsigned_type_for (TREE_TYPE (@0)); + tree a0 = fold_convert (utype, @0); }
That doesn't seem right for a gimple transformation. I assume you didn't write (convert:utype @0) in the output because you want to avoid doing it 3 times? IIRC you are allowed to write (convert:utype@1 @0) in the output and reuse @1 several times.
+ (rep (plus { a0; } { build_minus_one_cst (utype); }) + (bit_and (negate { a0; }) { a0; }))))) + /* popcount(X) > 1 -> (X & (X-1)) != 0. */ + (for cmp (gt le) + rep (ne eq) + (simplify + (cmp (convert? (popcount:s @0)) integer_onep) + (rep (bit_and (plus @0 { build_minus_one_cst (TREE_TYPE (@0)); }) @0) + { build_zero_cst (TREE_TYPE (@0)); }))))
Are there any types where this could be a problem? Say if you cast to a 1-bit type. Actually, even converting popcnt(__uint128_t(-1)) to signed char may be problematic.
-- Marc Glisse