https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120892
Bug ID: 120892 Summary: Missed unrolling at -O3 due to split-paths Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: acoplan at gcc dot gnu.org Target Milestone: --- Consider the following testcase, on aarch64: typedef struct { long ob_refcnt; int size; } PyListObject; PyListObject *src; PyListObject dst; int main(void) { #pragma GCC unroll 8 for (long i = 0; i < 128; i++) { PyListObject *v = &src[i]; if (v->ob_refcnt >= 0) v->ob_refcnt++; dst = *v; } } at -O2 it is unrolled successfully, as requested. I see: ./xgcc -B . -c t.c -S -o /dev/null -O2 -fdump-rtl-loop2_unroll-details=- | grep "unroll" t.c:9:3: note: considering unrolling loop 1 at BB 3 considering unrolling loop with constant number of iterations t.c:9:3: optimized: loop unrolled 7 times but at -O3 we fail to unroll the loop: ./xgcc -B . -c t.c -S -o /dev/null -O3 -fdump-rtl-loop2_unroll-details=- | grep "unroll" t.c:11:10: note: considering unrolling loop 1 at BB 3 considering unrolling loop with constant number of iterations considering unrolling loop with runtime-computable number of iterations considering unrolling loop stupidly ;; Not unrolling, contains branches the problem seems to be due to the split-paths gimple pass. Disabling the pass: ./xgcc -B . -c t.c -S -o /dev/null -O3 -fno-split-paths -fdump-rtl-loop2_unroll-details=- | grep "unroll" t.c:9:3: note: considering unrolling loop 1 at BB 3 considering unrolling loop with constant number of iterations t.c:9:3: optimized: loop unrolled 7 times the loop is unrolled as expected. Before split-paths, in SLSR, we have this loop: <bb 3> [local count: 1063004407]: # ivtmp.12_14 = PHI <ivtmp.12_15(5), ivtmp.12_8(2)> _6 = (void *) ivtmp.12_14; _4 = MEM[(long int *)_6]; if (_4 >= 0) goto <bb 4>; [59.00%] else goto <bb 5>; [41.00%] <bb 4> [local count: 627172604]: _5 = _4 + 1; _22 = (void *) ivtmp.12_14; MEM[(long int *)_22] = _5; <bb 5> [local count: 1063004410]: _21 = (void *) ivtmp.12_14; dst = MEM[(struct PyListObject *)_21]; ivtmp.12_15 = ivtmp.12_14 + 16; if (ivtmp.12_15 != _24) goto <bb 3>; [98.99%] else goto <bb 6>; [1.01%] and then split paths duplicates the contents of the common code in bb 5 into the conditional block (bb 4), to give: <bb 3> [local count: 1063004407]: # ivtmp.12_14 = PHI <ivtmp.12_13(6), ivtmp.12_8(2)> _6 = (void *) ivtmp.12_14; _4 = MEM[(long int *)_6]; if (_4 >= 0) goto <bb 4>; [59.00%] else goto <bb 5>; [41.00%] <bb 4> [local count: 627172604]: _5 = _4 + 1; _22 = (void *) ivtmp.12_14; MEM[(long int *)_22] = _5; _16 = (void *) ivtmp.12_14; dst = MEM[(struct PyListObject *)_16]; ivtmp.12_2 = ivtmp.12_14 + 16; if (ivtmp.12_2 != _24) goto <bb 6>; [98.99%] else goto <bb 7>; [1.01%] <bb 5> [local count: 435831806]: _21 = (void *) ivtmp.12_14; dst = MEM[(struct PyListObject *)_21]; ivtmp.12_15 = ivtmp.12_14 + 16; if (ivtmp.12_15 != _24) goto <bb 6>; [98.99%] else goto <bb 7>; [1.01%] <bb 6> [local count: 1052266997]: # ivtmp.12_13 = PHI <ivtmp.12_15(5), ivtmp.12_2(4)> goto <bb 3>; [100.00%] it's not clear to me that this transformation is even beneficial independently of the unrolling issue, but it's unfortunate that doing this blocks the requested unrolling.