https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86544
Bug ID: 86544
Summary: Popcount detection generates different code on C and
C++
Product: gcc
Version: unknown
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
CC: kugan at gcc dot gnu.org, law at gcc dot gnu.org
Target Milestone: ---
Great to see that GCC now detects the popcount loop in PR 82479!
I am seeing some curious differences between gcc and g++ though.
int
pc (unsigned long long b)
{
int c = 0;
while (b) {
b &= b - 1;
c++;
}
return c;
}
If compiled with gcc -O3 on aarch64 this gives:
pc:
fmov d0, x0
cnt v0.8b, v0.8b
addv b0, v0.8b
umov w0, v0.b[0]
ret
whereas if compiled with g++ -O3 it gives:
_Z2pcy:
.LFB0:
.cfi_startproc
fmov d0, x0
cmp x0, 0
cnt v0.8b, v0.8b
addv b0, v0.8b
umov w0, v0.b[0]
and x0, x0, 255
csel w0, w0, wzr, ne
ret
which is suboptimal. It seems that phiopt3 manages to optimise the C version
better. The GIMPLE dumps just before the phiopt pass are:
For the C (good version):
int c;
int _7;
<bb 2> [local count: 118111601]:
if (b_4(D) != 0)
goto <bb 3>; [89.00%]
else
goto <bb 4>; [11.00%]
<bb 3> [local count: 105119324]:
_7 = __builtin_popcountl (b_4(D));
<bb 4> [local count: 118111601]:
# c_12 = PHI <0(2), _7(3)>
return c_12;
For the C++ (bad version):
int c;
int _7;
<bb 2> [local count: 118111601]:
if (b_4(D) == 0)
goto <bb 4>; [11.00%]
else
goto <bb 3>; [89.00%]
<bb 3> [local count: 105119324]:
_7 = __builtin_popcountl (b_4(D));
<bb 4> [local count: 118111601]:
# c_12 = PHI <0(2), _7(3)>
return c_12;
As you can see the order of the gotos and the jump conditions is inverted.
It seems to me that the two are equivalent and GCC could be doing a better job
of optimising.
Can we improve phiopt to handle this more effectively?