https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86544

            Bug ID: 86544
           Summary: Popcount detection generates different code on C and
                    C++
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
                CC: kugan at gcc dot gnu.org, law at gcc dot gnu.org
  Target Milestone: ---

Great to see that GCC now detects the popcount loop in PR 82479!
I am seeing some curious differences between gcc and g++ though.
int
pc (unsigned long long b)
{
    int c = 0;

    while (b) {
        b &= b - 1;
        c++;
    }

    return c;
}

If compiled with gcc -O3 on aarch64 this gives:
pc:
        fmov    d0, x0
        cnt     v0.8b, v0.8b
        addv    b0, v0.8b
        umov    w0, v0.b[0]
        ret

whereas if compiled with g++ -O3 it gives:
_Z2pcy:
.LFB0:
        .cfi_startproc
        fmov    d0, x0
        cmp     x0, 0
        cnt     v0.8b, v0.8b
        addv    b0, v0.8b
        umov    w0, v0.b[0]
        and     x0, x0, 255
        csel    w0, w0, wzr, ne
        ret

which is suboptimal. It seems that phiopt3 manages to optimise the C version
better. The GIMPLE dumps just before the phiopt pass are:
For the C (good version):

  int c;
  int _7;

  <bb 2> [local count: 118111601]:
  if (b_4(D) != 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 3> [local count: 105119324]:
  _7 = __builtin_popcountl (b_4(D));

  <bb 4> [local count: 118111601]:
  # c_12 = PHI <0(2), _7(3)>
  return c_12;


For the C++ (bad version):

  int c;
  int _7;

  <bb 2> [local count: 118111601]:
  if (b_4(D) == 0)
    goto <bb 4>; [11.00%]
  else
    goto <bb 3>; [89.00%]

  <bb 3> [local count: 105119324]:
  _7 = __builtin_popcountl (b_4(D));

  <bb 4> [local count: 118111601]:
  # c_12 = PHI <0(2), _7(3)>
  return c_12;

As you can see the order of the gotos and the jump conditions is inverted.

It seems to me that the two are equivalent and GCC could be doing a better job
of optimising.

Can we improve phiopt to handle this more effectively?

Reply via email to