> Am 25.01.2025 um 10:47 schrieb Jakub Jelinek <ja...@redhat.com>:
> 
> Hi!
> 
> We already do this canonicalization in
> simplify_using_ranges::simplify_div_or_mod_using_ranges, but that means
> that it is not done at -O1 or when vrp is otherwise disabled, and that
> it can be done too late in some cases when e.g. the r8-2064
> "X / C1 op C2 into a simple range test." optimization triggers first.
> Note, for unsigned modulo we already have
> (simplify
>  (mod @0 (convert? (power_of_two_cand@1 @2)))
>  (if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0))
> ...
> optimization which duplicates what
> simplify_using_ranges::simplify_div_or_mod_using_ranges
> does in case ranges aren't needed.
> 
> For GCC 16 I think we should improve the niters pattern recognition
> and handle even what r8-2064 comes with, after all as I've tried to show
> in the PR the user could have written it that way.
> I've guarded this optimization on #if GIMPLE just in case this would stand
> in any way to the various divmult etc. simplification, guess that can be
> lifted for GCC 16 too.  In the modulo case we also handle
> unsigned % (power_of_two << n), but not really sure if we could do that
> for the division, because unsigned / (power_of_two << n) is not simple
> unsigned >> (log2 (power_of_two) + n), one can shift the bit out and then
> it becomes just 0.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Thanks,
Richard 

> 2025-01-25  Jakub Jelinek  <ja...@redhat.com>
> 
>    PR tree-optimization/118637
>    * match.pd: Canonicalize unsigned division by power of two to
>    right shift.
> 
>    * gcc.dg/tree-ssa/pr118637.c: New test.
> 
> --- gcc/match.pd.jj    2025-01-22 00:22:30.135367448 +0100
> +++ gcc/match.pd    2025-01-24 15:16:26.893426439 +0100
> @@ -965,6 +965,15 @@ (define_operator_list SYNC_FETCH_AND_AND
>   (bit_and @0 (negate @1))))
> 
> (for div (trunc_div ceil_div floor_div round_div exact_div)
> +#if GIMPLE
> + /* Canonicalize unsigned t / 4 to t >> 2.  */
> + (simplify
> +  (div @0 integer_pow2p@1)
> +  (if (INTEGRAL_TYPE_P (type)
> +       && (TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0)))
> +   (rshift @0 { build_int_cst (integer_type_node,
> +                   wi::exact_log2 (wi::to_wide (@1))); })))
> +#endif
>  /* Simplify (t * u) / u -> t.  */
>  (simplify
>   (div (mult:c @0 @1) @1)
> --- gcc/testsuite/gcc.dg/tree-ssa/pr118637.c.jj    2025-01-24 
> 16:09:44.821939764 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr118637.c    2025-01-24 16:11:20.653582854 
> +0100
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/118637 */
> +/* { dg-do compile { target clz } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "__builtin_clz|\\.CLZ" 2 "optimized" } 
> } */
> +
> +__attribute__((noipa)) unsigned
> +foo (unsigned x)
> +{
> +  unsigned result = 0;
> +  while (x /= 2)
> +    ++result;
> +  return result;
> +}
> +
> +__attribute__((noipa)) unsigned
> +bar (unsigned x)
> +{
> +  unsigned result = 0;
> +  while (x >>= 1)
> +    ++result;
> +  return result;
> +}
> 
>    Jakub
> 

Reply via email to