https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25623

--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
testcase from Comment #1 is wontfix (there is really not much to do at the
threading time since profile was not estimated realistically).
Original fortran testcase now works 
(after fix g:7e904d6c7f252ee947c237ed32dd43b2c248384d).
We do one threading in thread2 pass:

 Registering killing_def (path_oracle) _1
 Registering killing_def (path_oracle) ubound.4_14
Checking profitability of path (backwards):  
  [1] Registering jump thread: (2, 3) incoming edge;  (3, 6) nocopy;
path: 2->3->6 SUCCESS
Checking profitability of path (backwards):  bb:4 (6 insns) bb:10 (latch)
  Control statement insns: 2
  Overall: 4 insns


and give up on two because they crosses loop boundary.

Checking profitability of path (backwards):  bb:3 (2 insns) bb:4 (latch)
  Control statement insns: 2
  Overall: 0 insns

 Registering killing_def (path_oracle) S.6_56
path: 4->3->xx REJECTED
Checking profitability of path (backwards):  bb:6 (2 insns) bb:7 (latch)
  Control statement insns: 2
  Overall: 0 insns

 Registering killing_def (path_oracle) i_68
path: 7->6->xx REJECTED
 headers pass.

One path is the usual entry condition of loop known to be true (which I think
early opts should handle) and is eventually dealt with copy header pass.
Other path gets eventually a reason for the failure dumped:

Checking profitability of path (backwards):  bb:4 (16 insns) bb:6 (latch)
  Control statement insns: 2
  Overall: 14 insns
  FAIL: Did not thread around loop and would copy too many statements.
__attribute__((fn spec (". w w w w ")))

This is fact that loop is known to iterate at least once (there is explicit
+1). It may be interesting to peel for this.

With -O3 we vectorize the loop and while unroll the epilogue. However we get:

;;   basic block 14, loop depth 1, count 668941153 (estimated locally), maybe
hot
;;    prev block 16, next block 15, flags: (NEW, REACHABLE, VISITED)
;;    pred:       15 [always]  count:595357627 (estimated locally)
(FALLTHRU,DFS_BACK,EXECUTABLE)
;;                16 [always]  count:73583526 (estimated locally) (FALLTHRU)
  # i_34 = PHI <i_31(15), i_29(16)>
  _2 = i_34 + -1;
  _17 = (integer(kind=8)) _2;
  _18 = (*a_19(D))[_17];
  tmp_45 = __builtin_pow (_18,
3.33333333333333314829616256247390992939472198486328125e-1);
  tmp2_44 = tmp_45 * tmp_45;
  tmp4_43 = tmp2_44 * tmp2_44;
  _42 = (*b_24(D))[_17];
  _41 = _42 + tmp4_43;
  (*b_24(D))[_17] = _41;
  _39 = (*c_16(D))[_17];
  _38 = _39 + tmp2_44;
  (*c_16(D))[_17] = _38;
  i_31 = i_34 + 1;
  if (_1 < i_31)
    goto <bb 17>; [11.00%]
  else
    goto <bb 15>; [89.00%]

Cunrolli unloops it without fixing the profile resulting in inconsistent
profile:

;;   basic block 16, loop depth 0, count 668941153 (estimated locally), maybe
hot
;;   Invalid sum of incoming counts 73583527 (estimated locally), should be
668941153 (estimated locally)
;;    prev block 13, next block 17, flags: (NEW, REACHABLE, VISITED)
;;    pred:       13 [66.7% (guessed)]  count:63071594 (estimated locally)
(FALSE_VALUE)
;;                7 [10.0% (guessed)]  count:10511933 (estimated locally)
(TRUE_VALUE)
  # i_29 = PHI <tmp.21_9(13), 1(7)>
  _2 = i_29 + -1;
  _17 = (integer(kind=8)) _2;
  _18 = (*a_19(D))[_17];
  tmp_45 = __builtin_pow (_18,
3.33333333333333314829616256247390992939472198486328125e-1);
  tmp2_44 = tmp_45 * tmp_45;
  tmp4_43 = tmp2_44 * tmp2_44;
  _42 = (*b_24(D))[_17];
  _41 = _42 + tmp4_43;
  (*b_24(D))[_17] = _41;
  _39 = (*c_16(D))[_17];
  _38 = _39 + tmp2_44;
  (*c_16(D))[_17] = _38;
  i_31 = i_29 + 1;
;;    succ:       17 [always (guessed)]  count:668941153 (estimated locally)
(FALLTHRU)

;;   basic block 17, loop depth 0, count 105119324 (estimated locally), maybe
hot
;;   Invalid sum of incoming counts 700476950 (estimated locally), should be
105119324 (estimated locally)
;;    prev block 16, next block 5, flags: (NEW, VISITED)
;;    pred:       16 [always (guessed)]  count:668941153 (estimated locally)
(FALLTHRU)
;;                13 [33.3% (guessed)]  count:31535797 (estimated locally)
(TRUE_VALUE)
;;    succ:       5 [always]  count:105119324 (estimated locally)
(FALLTHRU,EXECUTABLE)

So I guess unlooping should fix the profile after itself, but does vect really
need to produce loops iterating precisely once?

Reply via email to