http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50268
--- Comment #6 from Marc Glisse <marc.glisse at normalesup dot org> 2011-09-02 08:03:22 UTC --- (In reply to comment #5) > This one is much better, and actually should lead to slightly better code than > C++98, because we don't do anything if _Nw > 1 (the 32-bit case is also better > but doesn't optimize the case _Nb % _GLIBCXX_BITSET_BITS_PER_WORD == 0 && _Nb > % > _GLIBCXX_BITSET_BITS_PER_ULL != 0. I don't care much these times) Looks better indeed. I think the compiler should be responsible for optimizing x&~0UL, not the library. I'll have to check that bitset<32>(x).count() has no overhead compared to a call to __builtin_popcount. Looks to me like _DoWork is actually _Nb<_GLIBCXX_BITSET_BITS_PER_ULL (more intuitive, and it makes _Nw and _Extrabits useless). I usually write the number ~((~static_cast<unsigned long long>(0)) << _Extrabits) as (1ULL << _Extrabits)-1 and just noticed that your version would be faster at runtime (here it is compile-time anyway), cool.