https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120892

            Bug ID: 120892
           Summary: Missed unrolling at -O3 due to split-paths
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

Consider the following testcase, on aarch64:

typedef struct {
  long ob_refcnt;
  int size;
} PyListObject;
PyListObject *src;
PyListObject dst;
int main(void) {
#pragma GCC unroll 8
  for (long i = 0; i < 128; i++) {
    PyListObject *v = &src[i];
    if (v->ob_refcnt >= 0)
      v->ob_refcnt++;
    dst = *v;
  }
}

at -O2 it is unrolled successfully, as requested.  I see:

./xgcc -B . -c t.c -S -o /dev/null -O2 -fdump-rtl-loop2_unroll-details=- | grep
"unroll"
t.c:9:3: note: considering unrolling loop 1 at BB 3
considering unrolling loop with constant number of iterations
t.c:9:3: optimized: loop unrolled 7 times

but at -O3 we fail to unroll the loop:

./xgcc -B . -c t.c -S -o /dev/null -O3 -fdump-rtl-loop2_unroll-details=- | grep
"unroll"
t.c:11:10: note: considering unrolling loop 1 at BB 3
considering unrolling loop with constant number of iterations
considering unrolling loop with runtime-computable number of iterations
considering unrolling loop stupidly
;; Not unrolling, contains branches

the problem seems to be due to the split-paths gimple pass.  Disabling the
pass:

./xgcc -B . -c t.c -S -o /dev/null -O3 -fno-split-paths
-fdump-rtl-loop2_unroll-details=- | grep "unroll"
t.c:9:3: note: considering unrolling loop 1 at BB 3
considering unrolling loop with constant number of iterations
t.c:9:3: optimized: loop unrolled 7 times

the loop is unrolled as expected.  Before split-paths, in SLSR, we have this
loop:

  <bb 3> [local count: 1063004407]:
  # ivtmp.12_14 = PHI <ivtmp.12_15(5), ivtmp.12_8(2)>
  _6 = (void *) ivtmp.12_14;
  _4 = MEM[(long int *)_6];
  if (_4 >= 0)
    goto <bb 4>; [59.00%]
  else
    goto <bb 5>; [41.00%]

  <bb 4> [local count: 627172604]:
  _5 = _4 + 1;
  _22 = (void *) ivtmp.12_14;
  MEM[(long int *)_22] = _5;

  <bb 5> [local count: 1063004410]:
  _21 = (void *) ivtmp.12_14;
  dst = MEM[(struct PyListObject *)_21];
  ivtmp.12_15 = ivtmp.12_14 + 16;
  if (ivtmp.12_15 != _24)
    goto <bb 3>; [98.99%]
  else
    goto <bb 6>; [1.01%]

and then split paths duplicates the contents of the common code in bb 5 into
the conditional block (bb 4), to give:

  <bb 3> [local count: 1063004407]:
  # ivtmp.12_14 = PHI <ivtmp.12_13(6), ivtmp.12_8(2)>
  _6 = (void *) ivtmp.12_14;
  _4 = MEM[(long int *)_6];
  if (_4 >= 0)
    goto <bb 4>; [59.00%]
  else
    goto <bb 5>; [41.00%]

  <bb 4> [local count: 627172604]:
  _5 = _4 + 1;
  _22 = (void *) ivtmp.12_14;
  MEM[(long int *)_22] = _5;
  _16 = (void *) ivtmp.12_14;
  dst = MEM[(struct PyListObject *)_16];
  ivtmp.12_2 = ivtmp.12_14 + 16;
  if (ivtmp.12_2 != _24)
    goto <bb 6>; [98.99%]
  else
    goto <bb 7>; [1.01%]

  <bb 5> [local count: 435831806]:
  _21 = (void *) ivtmp.12_14;
  dst = MEM[(struct PyListObject *)_21];
  ivtmp.12_15 = ivtmp.12_14 + 16;
  if (ivtmp.12_15 != _24)
    goto <bb 6>; [98.99%]
  else
    goto <bb 7>; [1.01%]

  <bb 6> [local count: 1052266997]:
  # ivtmp.12_13 = PHI <ivtmp.12_15(5), ivtmp.12_2(4)>
  goto <bb 3>; [100.00%]

it's not clear to me that this transformation is even beneficial independently
of the unrolling issue, but it's unfortunate that doing this blocks the
requested unrolling.

Reply via email to