http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50268
--- Comment #7 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-02 09:37:55 UTC --- Hi, (In reply to comment #6) > Looks better indeed. I think the compiler should be responsible for optimizing > x&~0UL, not the library. I'll have to check that bitset<32>(x).count() has no > overhead compared to a call to __builtin_popcount. Indeed, I had the same thought about the compiler. And really, we are doing anyway better than C++98 for 32-bit too, I'm not particularly worried. But, if we have time we should check and open an optimization PR in case. > Looks to me like _DoWork is actually _Nb<_GLIBCXX_BITSET_BITS_PER_ULL (more > intuitive, and it makes _Nw and _Extrabits useless). I usually write the > number > ~((~static_cast<unsigned long long>(0)) << _Extrabits) as (1ULL << > _Extrabits)-1 and just noticed that your version would be faster at runtime > (here it is compile-time anyway), cool. Ah great. I'm so stupid, trying to do all the work in terms of _Nw and so on, where in this case we have _Nb itself available. About the formula, interesting indeed what you are noticing, I guess I will stick to the more obfuscated one for compile-time too, because like this it's clear we are doing the same adjustment done normally in _M_do_sanitize at run-time. I'm attaching the updated patch I'm going to test and commit (4_6-branch too).