https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I think it looks too complex to do it in match.pd, wouldn't it be better to do it say in tree-ssa-math-opts.c and thus just once? That said, due to the x & -x it is really bound to a small number of values that need to be checked, so shouldn't be that expensive. One question is if it should start from the x & -x operation and go through immediate uses to a table lookup, or if we start from the table lookup and follow index defs to the x & -x operation, going through a small limited number of casts and *, >> and % operations (with constant last operands). Guess we need to also compute approximate costs of the originally used sequence and compare that to the cost of ctz (and only do it if there is hw ctz). And decide if we emit x ? ctz (x) : const; or something different. This is GIMPLE, so __builtin_ctz* is undefined at 0 unconditionally, it is up to the RTL/backends then to optimize the conditionally called ctz into unconditional one if the value of ctz (0) in hw is equal or related to the chosen value.