https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68373
Bug ID: 68373
Summary: autopar fails on loop exit phi with argument defined
outside loop
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
Target Milestone: ---
First consider a parloops testcase test.c, with a use of the final value of the
iteration variable (return i):
...
unsigned int
foo (int *a, int n)
{
int i;
for (i = 0; i < n; ++i)
a[i] = 1;
return i;
}
...
Say we compile the testcase like this:
...
$ gcc -S -O2 test.c -ftree-parallelize-loops=2 -fdump-tree-all-details
...
We find in the parloops dump-file:
...
SUCCESS: may be parallelized
...
The autoparallelization is possible because -ftree-scev-cprop substitutes the
final iteration variable value, which eliminates the only loop exit phi:
...
final value replacement:
i_1 = PHI <i_10(4)>
with
i_1 = n_4(D);
...
Now consider a similar testcase test-2.c, but with loop counter and bound
unsigned:
...
unsigned int
foo (int *a, unsigned int n)
{
unsigned int i;
for (i = 0; i < n; ++i)
a[i] = 1;
return i;
}
...
Also here, we have -ftree-scev-cprop eliminating the only loop exit phi:
...
i_2 = PHI <i_10(3)>
with
i_2 = n_4(D);
...
But, in a subsequent pass_copy_prop, we manage to propagate i_2 (that didn't
happen in test.c because of signedness differences), eliminate the empty bb,
effectively reintroducing a new loop exit phi:
...
<bb 4>:
# i_12 = PHI <0(3), i_10(5)>
_5 = (long unsigned int) i_12;
_6 = _5 * 4;
_8 = a_7(D) + _6;
*_8 = 1;
i_10 = i_12 + 1;
if (n_4(D) > i_10)
goto <bb 5>;
else
goto <bb 6>;
<bb 5>:
goto <bb 4>;
<bb 6>:
# i_14 = PHI <n_4(D)(4), 0(2)>
...
And in parloops during
gather_scalar_reductions/vect_analyze_loop_form/vect_analyze_loop_form_1 we
split the loop exit edge, which results in this exit phi:
...
<bb 4>:
# i_12 = PHI <0(3), i_10(5)>
_5 = (long unsigned int) i_12;
_6 = _5 * 4;
_8 = a_7(D) + _6;
*_8 = 1;
i_10 = i_12 + 1;
if (n_4(D) > i_10)
goto <bb 5>;
else
goto <bb 7>;
<bb 5>:
goto <bb 4>;
<bb 7>:
# n_2 = PHI <n_4(D)(4)>
...
And then parloops stumbles over that phi:
...
phi is n_2 = PHI <n_4(D)(4)>
arg of phi to exit: value n_4(D) used outside loop
checking if it a part of reduction pattern:
FAILED: it is not a part of reduction.
...
In split_loop_exit_edge we preserve loop-closed ssa, which is why it introduces
phis. But we should be able to optimize
n_2 = PHI <n_4(D)(4)>
into
n_2 = n_4(D)
without breaking loop-closed ssa form.
Alternatively, we could improve parloops to allow phis like this and handle
them.