https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68647
Bug ID: 68647 Summary: __builtin_popcountll doesn't generate popcnt instructions when targeting -mpopcnt on x86_32 Product: gcc Version: 5.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jamrial at gmail dot com Target Milestone: --- #include <stdint.h> int fn (uint64_t a) { return __builtin_popcountll(a); } gcc -O2 -mpopcnt xor eax, eax popcnt rax, rdi ret clang -O2 -mpopcnt popcnt rax, rdi ret gcc -O2 -m32 -mpopcnt sub esp, 20 push DWORD PTR [esp+28] push DWORD PTR [esp+28] call __popcountdi2 add esp, 28 ret clang -O2 -m32 -mpopcnt popcnt ecx, dword ptr [esp + 8] popcnt eax, dword ptr [esp + 4] add eax, ecx ret Unrelated to this ticket, but GCC should also consider doing like clang and make the builtins inline the relevant code when the target hardware lacks support for the popcnt instruction. I know of at least two projects that provide their own popcount functions instead of using the builtins when popcnt is not available because the calls to __popcount[sd]i2 are slow.