On Tue, 13 Aug 2019, Wilco Dijkstra wrote:

Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and
popcount (x) == 1 into (x-1) <u (x & -x).  These trigger only for
single-use cases and support an optional convert.  A microbenchmark
shows a speedup of 2-2.5x on both x64 and AArch64.

Is that true even on targets that have a popcount instruction? (-mpopcnt for x64)

diff --git a/gcc/match.pd b/gcc/match.pd
index 
0317bc704f771f626ab72189b3a54de00087ad5a..bf4351a330f45f3a1424d9792cefc3da6267597d
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5356,7 +5356,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
       rep (eq eq ne ne)
    (simplify
      (cmp (popcount @0) integer_zerop)
-      (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))))
+      (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))
+  /* popcount(X) == 1 -> (X-1) <u (X & -X).  */
+  (for cmp (eq ne)
+       rep (lt ge)
+    (simplify
+      (cmp (convert? (popcount:s @0)) integer_onep)
+      (with {
+             tree utype = unsigned_type_for (TREE_TYPE (@0));
+             tree a0 = fold_convert (utype, @0); }

That doesn't seem right for a gimple transformation. I assume you didn't write (convert:utype @0) in the output because you want to avoid doing it 3 times? IIRC you are allowed to write (convert:utype@1 @0) in the output and reuse @1 several times.

+       (rep (plus { a0; } { build_minus_one_cst (utype); })
+            (bit_and (negate { a0; }) { a0; })))))
+  /* popcount(X) > 1 -> (X & (X-1)) != 0.  */
+  (for cmp (gt le)
+       rep (ne eq)
+    (simplify
+      (cmp (convert? (popcount:s @0)) integer_onep)
+      (rep (bit_and (plus @0 { build_minus_one_cst (TREE_TYPE (@0)); }) @0)
+          { build_zero_cst (TREE_TYPE (@0)); }))))

Are there any types where this could be a problem? Say if you cast to a 1-bit type. Actually, even converting popcnt(__uint128_t(-1)) to signed char may be problematic.

--
Marc Glisse

Reply via email to