https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68373
Bug ID: 68373 Summary: autopar fails on loop exit phi with argument defined outside loop Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- First consider a parloops testcase test.c, with a use of the final value of the iteration variable (return i): ... unsigned int foo (int *a, int n) { int i; for (i = 0; i < n; ++i) a[i] = 1; return i; } ... Say we compile the testcase like this: ... $ gcc -S -O2 test.c -ftree-parallelize-loops=2 -fdump-tree-all-details ... We find in the parloops dump-file: ... SUCCESS: may be parallelized ... The autoparallelization is possible because -ftree-scev-cprop substitutes the final iteration variable value, which eliminates the only loop exit phi: ... final value replacement: i_1 = PHI <i_10(4)> with i_1 = n_4(D); ... Now consider a similar testcase test-2.c, but with loop counter and bound unsigned: ... unsigned int foo (int *a, unsigned int n) { unsigned int i; for (i = 0; i < n; ++i) a[i] = 1; return i; } ... Also here, we have -ftree-scev-cprop eliminating the only loop exit phi: ... i_2 = PHI <i_10(3)> with i_2 = n_4(D); ... But, in a subsequent pass_copy_prop, we manage to propagate i_2 (that didn't happen in test.c because of signedness differences), eliminate the empty bb, effectively reintroducing a new loop exit phi: ... <bb 4>: # i_12 = PHI <0(3), i_10(5)> _5 = (long unsigned int) i_12; _6 = _5 * 4; _8 = a_7(D) + _6; *_8 = 1; i_10 = i_12 + 1; if (n_4(D) > i_10) goto <bb 5>; else goto <bb 6>; <bb 5>: goto <bb 4>; <bb 6>: # i_14 = PHI <n_4(D)(4), 0(2)> ... And in parloops during gather_scalar_reductions/vect_analyze_loop_form/vect_analyze_loop_form_1 we split the loop exit edge, which results in this exit phi: ... <bb 4>: # i_12 = PHI <0(3), i_10(5)> _5 = (long unsigned int) i_12; _6 = _5 * 4; _8 = a_7(D) + _6; *_8 = 1; i_10 = i_12 + 1; if (n_4(D) > i_10) goto <bb 5>; else goto <bb 7>; <bb 5>: goto <bb 4>; <bb 7>: # n_2 = PHI <n_4(D)(4)> ... And then parloops stumbles over that phi: ... phi is n_2 = PHI <n_4(D)(4)> arg of phi to exit: value n_4(D) used outside loop checking if it a part of reduction pattern: FAILED: it is not a part of reduction. ... In split_loop_exit_edge we preserve loop-closed ssa, which is why it introduces phis. But we should be able to optimize n_2 = PHI <n_4(D)(4)> into n_2 = n_4(D) without breaking loop-closed ssa form. Alternatively, we could improve parloops to allow phis like this and handle them.