https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83171
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target|x86_64-linux-gnu | Status|UNCONFIRMED |NEW Keywords| |missed-optimization Last reconfirmed| |2017-11-26 Component|c++ |tree-optimization Host|x86_64-linux-gnu | Ever confirmed|0 |1 Build|x86_64-linux-gnu | --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Works for me with aarch64: _Z3fooj: .LFB1136: .cfi_startproc and x0, x0, 255 fmov d0, x0 cnt v0.8b, v0.8b addv b0, v0.8b umov w0, v0.b[0] and x0, x0, 255 ret And works for me with -march=native: _Z3fooj: .LFB1162: .cfi_startproc movzbl %dil, %eax popcntq %rax, %rax ret Basically the following is not being optimized: int _3; long unsigned int _4; long long unsigned int _5; unsigned int _6; <bb 2> [100.00%]: _6 = value_1(D) & 255; _5 = (long long unsigned int) _6; _3 = __builtin_popcountl (_5); _4 = (long unsigned int) _3; To just: _6 = value_1(D) & 255; _3 = __builtin_popcount (_6); _4 = (long unsigned int) _3;