https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117607
Bug ID: 117607 Summary: unnecessary scev optimization for popcnt Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: fxue at os dot amperecomputing.com Target Milestone: --- Look at the case: int *foo(long v, int *p) { while (v) { long t = v; v &= (v - 1); *p++ = __builtin_ctzl(t); } return p; } On arch with popcnt supported, such as aarch64, with "-O2 -ftree-scev-cprop", it generates: foo: .LFB0: .cfi_startproc cbz x0, .L4 mov x4, x1 mov x2, x0 .p2align 5,,15 .L3: rbit x3, x2 clz x3, x3 str w3, [x4], 4 sub x3, x2, #1 ands x2, x2, x3 bne .L3 fmov d31, x0 add x1, x1, 4 cnt v31.8b, v31.8b addv b31, v31.8b fmov w0, s31 sub w2, w0, #1 add x0, x1, w2, uxtw 2 ret .p2align 2,,3 .L4: mov x0, x1 ret For ""-O2 -fno-tree-scev-cprop", it gets simpler codegen: .LFB0: .cfi_startproc mov x2, x0 mov x0, x1 cbz x2, .L2 .p2align 5,,15 .L3: rbit x1, x2 sub x3, x2, #1 clz x1, x1 str w1, [x0], 4 ands x2, x2, x3 bne .L3 .L2: ret The cause is that scev would compute exit value of "p" using POPCNT in one-shot. However, since "p" value is used and has to be evaluated at every iteration, so the computation at exit is unneeded. final value replacement: p_15 = PHI <p_11(3)> with expr: (int *) (((sizetype) (unsigned int) (.POPCOUNT ((unsigned long) v_7(D)) + -1) * 4 + (sizetype) p_8(D)) + 4) final stmt: p_15 = (int *) _25