https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68647

            Bug ID: 68647
           Summary: __builtin_popcountll doesn't generate popcnt
                    instructions when targeting -mpopcnt on x86_32
           Product: gcc
           Version: 5.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamrial at gmail dot com
  Target Milestone: ---

#include <stdint.h>
int fn (uint64_t a) {
    return __builtin_popcountll(a);
}


gcc -O2 -mpopcnt
        xor     eax, eax
        popcnt  rax, rdi
        ret


clang -O2 -mpopcnt
        popcnt  rax, rdi
        ret


gcc -O2 -m32 -mpopcnt
        sub     esp, 20
        push    DWORD PTR [esp+28]
        push    DWORD PTR [esp+28]
        call    __popcountdi2
        add     esp, 28
        ret


clang -O2 -m32 -mpopcnt
        popcnt  ecx, dword ptr [esp + 8]
        popcnt  eax, dword ptr [esp + 4]
        add     eax, ecx
        ret


Unrelated to this ticket, but GCC should also consider doing like clang and
make the builtins inline the relevant code when the target hardware lacks
support for the popcnt instruction.
I know of at least two projects that provide their own popcount functions
instead of using the builtins when popcnt is not available because the calls to
__popcount[sd]i2 are slow.

Reply via email to