Sample code:
#include
int main (void)
{
long long a = 0xLL; // 48 bits set
int popcount;
#if 1
popcount = __builtin_popcountll (a);
#else
popcount = __popcountdi2 (a);
#endif
printf ("%llx popcount = %d\n", a, popcount);
return 0;
}
If -mpopcnt is enabled, this code only outputs the correct value (48) when -O3
is on (apparently it's calculating it at compile time). Without optimizations,
it is apparently only counting the bits in the lower dword of the long long
variable.
OTOH, If __popcountdi2 is used, it works correctly (but according to the
assembly code it's not really using the popcnt instruction which means it's
much slower).
Test runs and output follow (note you need a CPU which supports the popcnt
instruction):
=> gcc popcnt.c -o popcnt -Wall -O0 -mpopcnt && ./popcnt
popcount = 32
=> gcc popcnt.c -o popcnt -Wall -O3 -mpopcnt && ./popcnt
popcount = 48
=> gcc popcnt.c -o popcnt -Wall -O0 && ./popcnt
popcount = 48
=> gcc popcnt.c -o popcnt -Wall -O3 && ./popcnt
popcount = 48
--
Summary: __builtin_popcountll fails with -O0 and -mpopcnt
Product: gcc
Version: 4.1.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rbarreira at gmail dot com
GCC build triplet: i586-suse-linux
GCC host triplet: i586-suse-linux
GCC target triplet: i586-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43406