https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118637
--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 24 Jan 2025, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118637 > > --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > I guess the first question is why do we optimize unsigned / 2 into unsigned >> > 1 during GIMPLE opts only in VRP (evrp etc.) as can be seen with just > compiling > it with -O1, even optimized dump has > x_9 = x_3(D) / 2; > and only expansion turns that into > (insn 13 12 0 (parallel [ > (set (reg/v:SI 99 [ x ]) > (lshiftrt:SI (reg/v:SI 101 [ x ]) > (const_int 1 [0x1]))) > (clobber (reg:CC 17 flags)) > ]) "pr118637.c":5:12 -1 > (expr_list:REG_EQUAL (udiv:SI (reg/v:SI 101 [ x ]) > (const_int 2 [0x2])) > (nil))) > So e.g. with -O1 we don't optimize > void link_error (void); > > void > foo (unsigned x) > { > if (x / 2 != x >> 1) > link_error (); > } > until RTL. I'd say it's a missed canonicalization. For signed we have different UB (when x is negative). > The r8-2064-g8d1628eb33d4f5 optimization is sane IMHO; and user could have > written it that way as well. > > Consider > unsigned > foo (unsigned x) > { > unsigned result = 0; > while (x /= 2) > ++result; > return result; > } > > unsigned > bar (unsigned x) > { > unsigned result = 0; > while (x >>= 1) > ++result; > return result; > } > > unsigned > baz (unsigned x) > { > unsigned result = 0, t; > while ((t = x, x /= 2, t > 1)) > ++result; > return result; > } > > unsigned > qux (unsigned x) > { > unsigned result = 0, t; > while ((t = x, x >>= 1, t > 1)) > ++result; > return result; > } > > unsigned > fred (unsigned x) > { > unsigned result = 0; > while (x > 1) > x /= 2, ++result; > return result; > } > > unsigned > corge (unsigned x) > { > unsigned result = 0; > while (x > 1) > ++result, x >>= 1; > return result; > } > I think all of these compute the same thing, but we only pattern recognize the > first one and clang only pattern recognizes the first two. Improving pattern recog would be also OK. If we'd want to fix for GCC 15 I'd probably prefer to canonicalize x / power-of-two-CST to x >> log(CST) given that's what VRP will do anyway.