https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86544
Bug ID: 86544 Summary: Popcount detection generates different code on C and C++ Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org CC: kugan at gcc dot gnu.org, law at gcc dot gnu.org Target Milestone: --- Great to see that GCC now detects the popcount loop in PR 82479! I am seeing some curious differences between gcc and g++ though. int pc (unsigned long long b) { int c = 0; while (b) { b &= b - 1; c++; } return c; } If compiled with gcc -O3 on aarch64 this gives: pc: fmov d0, x0 cnt v0.8b, v0.8b addv b0, v0.8b umov w0, v0.b[0] ret whereas if compiled with g++ -O3 it gives: _Z2pcy: .LFB0: .cfi_startproc fmov d0, x0 cmp x0, 0 cnt v0.8b, v0.8b addv b0, v0.8b umov w0, v0.b[0] and x0, x0, 255 csel w0, w0, wzr, ne ret which is suboptimal. It seems that phiopt3 manages to optimise the C version better. The GIMPLE dumps just before the phiopt pass are: For the C (good version): int c; int _7; <bb 2> [local count: 118111601]: if (b_4(D) != 0) goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] <bb 3> [local count: 105119324]: _7 = __builtin_popcountl (b_4(D)); <bb 4> [local count: 118111601]: # c_12 = PHI <0(2), _7(3)> return c_12; For the C++ (bad version): int c; int _7; <bb 2> [local count: 118111601]: if (b_4(D) == 0) goto <bb 4>; [11.00%] else goto <bb 3>; [89.00%] <bb 3> [local count: 105119324]: _7 = __builtin_popcountl (b_4(D)); <bb 4> [local count: 118111601]: # c_12 = PHI <0(2), _7(3)> return c_12; As you can see the order of the gotos and the jump conditions is inverted. It seems to me that the two are equivalent and GCC could be doing a better job of optimising. Can we improve phiopt to handle this more effectively?