https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
Aliaksei Kandratsenka <alkondratenko at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |alkondratenko at gmail dot com --- Comment #10 from Aliaksei Kandratsenka <alkondratenko at gmail dot com> --- There is similar issue with bsr and __builtin_clz. Looks like for __builtin_clz gcc does 31 - <bsr-result>. And 31 - __builtin_clz does gets compiled optimized to plain bsr, but only under --march=haswell or later amd cpus. Under earlier cpus it generates 2 redundant 31 - arg computations. This is easy to play with at: https://godbolt.org/g/o7gNSS Clang-en doesn't have that same problem (but they have another. Under -march=haswell they sometimes too strongly prefer lzcnt which returns different result and thus requires extra computation).