Dear gcc developers, Im am not sure if this is the right email address. Please give me the right email address if necessary.
I checked the implementation of std::bitset::count in gcc and find that it uses a table look up. This is not optimal. First on 64-bit machines divide-and-conquer algorithms are faster than table look up and save even memory. Second if the bitset contains several words, there are for both 32 and 64 bit machines population count algorithms that are faster than the simple loop over all words used by boost. Together with Y. Edel I developed a new version,which saves 60% against the simple algorithm. We put the code online (http://cage.ugent.be/~klein/popc.html). A disadvantage of our implementation is that we relay on aggressive optimization (-O3) of the compiler. If you are searching a compromise between speed and code complexity. The first iteration of the Harley-Seal-method (25% faster than the simple loop) would be a candidate. You find this and other population count algorithms on our homepage, too. If there is interest I can add one of the advance algorithms to the bitset libary. Best wishes Andreas Klein