Hi!

We already do this canonicalization in
simplify_using_ranges::simplify_div_or_mod_using_ranges, but that means
that it is not done at -O1 or when vrp is otherwise disabled, and that
it can be done too late in some cases when e.g. the r8-2064
"X / C1 op C2 into a simple range test." optimization triggers first.
Note, for unsigned modulo we already have
 (simplify
  (mod @0 (convert? (power_of_two_cand@1 @2)))
  (if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0))
...
optimization which duplicates what
simplify_using_ranges::simplify_div_or_mod_using_ranges
does in case ranges aren't needed.

For GCC 16 I think we should improve the niters pattern recognition
and handle even what r8-2064 comes with, after all as I've tried to show
in the PR the user could have written it that way.
I've guarded this optimization on #if GIMPLE just in case this would stand
in any way to the various divmult etc. simplification, guess that can be
lifted for GCC 16 too.  In the modulo case we also handle
unsigned % (power_of_two << n), but not really sure if we could do that
for the division, because unsigned / (power_of_two << n) is not simple
unsigned >> (log2 (power_of_two) + n), one can shift the bit out and then
it becomes just 0.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-25  Jakub Jelinek  <ja...@redhat.com>

        PR tree-optimization/118637
        * match.pd: Canonicalize unsigned division by power of two to
        right shift.

        * gcc.dg/tree-ssa/pr118637.c: New test.

--- gcc/match.pd.jj     2025-01-22 00:22:30.135367448 +0100
+++ gcc/match.pd        2025-01-24 15:16:26.893426439 +0100
@@ -965,6 +965,15 @@ (define_operator_list SYNC_FETCH_AND_AND
   (bit_and @0 (negate @1))))
 
 (for div (trunc_div ceil_div floor_div round_div exact_div)
+#if GIMPLE
+ /* Canonicalize unsigned t / 4 to t >> 2.  */
+ (simplify
+  (div @0 integer_pow2p@1)
+  (if (INTEGRAL_TYPE_P (type)
+       && (TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0)))
+   (rshift @0 { build_int_cst (integer_type_node,
+                              wi::exact_log2 (wi::to_wide (@1))); })))
+#endif
  /* Simplify (t * u) / u -> t.  */
  (simplify
   (div (mult:c @0 @1) @1)
--- gcc/testsuite/gcc.dg/tree-ssa/pr118637.c.jj 2025-01-24 16:09:44.821939764 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr118637.c    2025-01-24 16:11:20.653582854 
+0100
@@ -0,0 +1,22 @@
+/* PR tree-optimization/118637 */
+/* { dg-do compile { target clz } } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "__builtin_clz|\\.CLZ" 2 "optimized" } } 
*/
+
+__attribute__((noipa)) unsigned
+foo (unsigned x)
+{
+  unsigned result = 0;
+  while (x /= 2)
+    ++result;
+  return result;
+}
+
+__attribute__((noipa)) unsigned
+bar (unsigned x)
+{
+  unsigned result = 0;
+  while (x >>= 1)
+    ++result;
+  return result;
+}

        Jakub

Reply via email to