https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118637

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 24 Jan 2025, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118637
> 
> --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> I guess the first question is why do we optimize unsigned / 2 into unsigned >>
> 1 during GIMPLE opts only in VRP (evrp etc.) as can be seen with just 
> compiling
> it with -O1, even optimized dump has
>   x_9 = x_3(D) / 2;
> and only expansion turns that into
> (insn 13 12 0 (parallel [
>             (set (reg/v:SI 99 [ x ])
>                 (lshiftrt:SI (reg/v:SI 101 [ x ])
>                     (const_int 1 [0x1])))
>             (clobber (reg:CC 17 flags))
>         ]) "pr118637.c":5:12 -1
>      (expr_list:REG_EQUAL (udiv:SI (reg/v:SI 101 [ x ])
>             (const_int 2 [0x2]))
>         (nil)))
> So e.g. with -O1 we don't optimize
> void link_error (void);
> 
> void
> foo (unsigned x)
> {
>   if (x / 2 != x >> 1)
>     link_error ();
> }
> until RTL.

I'd say it's a missed canonicalization.  For signed we have different
UB (when x is negative).

> The r8-2064-g8d1628eb33d4f5 optimization is sane IMHO; and user could have
> written it that way as well.
> 
> Consider
> unsigned
> foo (unsigned x)
> {
>   unsigned result = 0;
>   while (x /= 2)
>     ++result;
>   return result;
> }
> 
> unsigned
> bar (unsigned x)
> {
>   unsigned result = 0;
>   while (x >>= 1)
>     ++result;
>   return result;
> }
> 
> unsigned
> baz (unsigned x)
> {
>   unsigned result = 0, t;
>   while ((t = x, x /= 2, t > 1))
>     ++result;
>   return result;
> }
> 
> unsigned
> qux (unsigned x)
> {
>   unsigned result = 0, t;
>   while ((t = x, x >>= 1, t > 1))
>     ++result;
>   return result;
> }
> 
> unsigned
> fred (unsigned x)
> {
>   unsigned result = 0;
>   while (x > 1)
>     x /= 2, ++result;
>   return result;
> }
> 
> unsigned
> corge (unsigned x)
> {
>   unsigned result = 0;
>   while (x > 1)
>     ++result, x >>= 1;
>   return result;
> }
> I think all of these compute the same thing, but we only pattern recognize the
> first one and clang only pattern recognizes the first two.

Improving pattern recog would be also OK.

If we'd want to fix for GCC 15 I'd probably prefer to canonicalize
x / power-of-two-CST to x >> log(CST) given that's what VRP will
do anyway.

Reply via email to