https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87070
Bug ID: 87070 Summary: Combine popcount on pieces to a single popcountll Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- #include <bitset> int f(unsigned long long x){ return std::bitset<64>(x).count(); } should be a portable way to call a 64-bit popcount if it exists. Bug 63218 (bitset always uses long, never anything larger or smaller) sadly makes it harder. The optimizer could compensate, but it doesn't, not in the "too small" case (bug 83171), and not in the "too large" case, which is the topic of this report. What I get with gcc on win64: _13 = (long unsigned int) x_2(D); _14 = x_2(D) >> 32; _15 = (long unsigned int) _14; _22 = __builtin_popcountl (_13); _29 = __builtin_popcountl (_15); _12 = (unsigned int) _22; _3 = (unsigned int) _29; _9 = _3 + _12; _4 = (int) _9; return _4; This dump is from gcc-7, but testing int f(unsigned long long x){ return __builtin_popcount(x)+__builtin_popcount(x>>32); } with trunk on linux shows it can't have changed much. To be clear, this report is specifically about transforming the sum of popcount on the 2 halves of a number into a single bigger popcount. A match.pd transformation may be sufficient.