http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
--- Comment #11 from Cristian RodrÃguez <crrodriguez at opensuse dot org> ---
Not to be annoying, but compiling the test case attached to this bug report
with clang 3.3 produces code in where
inline u32 popcount64_1(u64 x) { return __builtin_popcountll(x); }
is over 3 times faster than GCC 4.8.1 in x86_64.
I think GCC could "just" generate IFUNCS for generic targets , in x86_64 one
function with attribute target popcnt and the other a call to libgcc that at
least matches the clang performance.