https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87070

            Bug ID: 87070
           Summary: Combine popcount on pieces to a single popcountll
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

#include <bitset>
int f(unsigned long long x){ return std::bitset<64>(x).count(); }

should be a portable way to call a 64-bit popcount if it exists. Bug 63218
(bitset always uses long, never anything larger or smaller) sadly makes it
harder. The optimizer could compensate, but it doesn't, not in the "too small"
case (bug 83171), and not in the "too large" case, which is the topic of this
report. What I get with gcc on win64:

  _13 = (long unsigned int) x_2(D);
  _14 = x_2(D) >> 32;
  _15 = (long unsigned int) _14;
  _22 = __builtin_popcountl (_13);
  _29 = __builtin_popcountl (_15);
  _12 = (unsigned int) _22;
  _3 = (unsigned int) _29;
  _9 = _3 + _12;
  _4 = (int) _9;
  return _4;

This dump is from gcc-7, but testing

int f(unsigned long long x){
  return __builtin_popcount(x)+__builtin_popcount(x>>32);
}

with trunk on linux shows it can't have changed much.

To be clear, this report is specifically about transforming the sum of popcount
on the 2 halves of a number into a single bigger popcount. A match.pd
transformation may be sufficient.

Reply via email to