https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351

--- Comment #8 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Looking at it some more, I think the loop is valid to vectorize. But we don't
seem to vectorize the reduction jumping back to the outerloop:

;;   basic block 384, loop depth 3, count 8598980 (estimated locally, freq
26.9637), maybe hot
;;    prev block 383, next block 458, flags: (NEW, REACHABLE, VISITED)
;;    pred:       387 [94.5% (guessed)]  count:8598980 (estimated locally, freq
26.9637) (TRUE_VALUE,EXECUTABLE)
  # RANGE [irange] int [1, +INF]
  _1643 = ci_y_1924 + 1;
  _3519 = vect_vec_iv_.4957_3520 + { 4, 4, 4, 4 };
  # PT = nonlocal escaped null
  vectp.4959_3503 = vectp.4959_3504 + 16;
  ivtmp_3495 = ivtmp_3496 + 4;
  _3493 = (unsigned int) ivtmp_3495;
  next_mask_3470 = .WHILE_ULT (_3493, _3494, { 0, 0, 0, 0 });
  if (next_mask_3470 == { 0, 0, 0, 0 })
    goto <bb 844>; [5.50%]
  else
    goto <bb 458>; [94.50%]
;;    succ:       844 [5.5% (guessed)]  count:472944 (estimated locally, freq
1.4830) (TRUE_VALUE,EXECUTABLE)
;;                458 [94.5% (guessed)]  count:8126036 (estimated locally, freq
25.4807) (FALSE_VALUE,EXECUTABLE)

;;   basic block 458, loop depth 3, count 8126036 (estimated locally, freq
25.4807), maybe hot
;;    prev block 384, next block 844, flags: (NEW, REACHABLE, VISITED)
;;    pred:       384 [94.5% (guessed)]  count:8126036 (estimated locally, freq
25.4807) (FALSE_VALUE,EXECUTABLE)
  goto <bb 387>; [100.00%]
;;    succ:       387 [always]  count:8126036 (estimated locally, freq 25.4807)
(FALLTHRU,DFS_BACK,EXECUTABLE)

;;   basic block 844, loop depth 2, count 472944 (estimated locally, freq
1.4830), maybe hot
;;    prev block 458, next block 840, flags: (NEW, VISITED)
;;    pred:       384 [5.5% (guessed)]  count:472944 (estimated locally, freq
1.4830) (TRUE_VALUE,EXECUTABLE)
  # RANGE [irange] int [1, +INF]
  # _3469 = PHI <_1643(384)>
  _3538 = niters.4954_3792;
  _3536 = (intD.10) _3538;
  tmp.4955_3537 = ci_y_1923 + _3536;
  if (_3538 == niters.4954_3792)
    goto <bb 385>; [25.00%]
  else
    goto <bb 840>; [75.00%]

where

  _1643 = ci_y_1924 + 1;

has stayed scalar and so LCSSA code inserts a PHI here:

  # _3469 = PHI <_1643(384)>

which is unused, as BB 384 is considered the main exit. So it assumes that if
you exit from 384 -> 844 that you've done all iterations and so it just uses
niters + ci_y_1923.
i.e. just adds the number of iterations to ci_y. So I don't think that's
wrong..

I could use some help here richi in whether this loop *is* valid to vectorize
or not.  I've not yet been able to create a small reproducer but the loop looks
like:

(gdb) p debug_loop (loop, 3)
loop_48 (header = 387, latch = 458, finite_p
upper_bound 2147483647
likely_upper_bound 2147483647
iterations by profile: 8.347976 (unreliable, maybe flat) entry count:1081571
(estimated locally, freq 3.3915))
{
  bb_384 (preds = {bb_387 }, succs = {bb_385 bb_458 })
  {
    <bb 384> [local count: 9554422]:
    _1643 = ci_y_1924 + 1;
    if (_1643 == _1649)
      goto <bb 385>; [5.50%]
    else
      goto <bb 458>; [94.50%]

  }
  bb_458 (preds = {bb_384 }, succs = {bb_387 })
  {
    <bb 458> [local count: 9028929]:
    goto <bb 387>; [100.00%]

  }
  bb_387 (preds = {bb_458 bb_386 }, succs = {bb_384 bb_388 })
  {
    <bb 387> [local count: 10110500]:
    # ci_y_1924 = PHI <_1643(458), ci_y_1923(386)>
    _1651 = _1650 + ci_y_1924;
    _1652 = _1651 + 1;
    _1653 = (long unsigned int) _1652;
    _1655 = _1653 * 4;
    _1656 = _1654 + _1655;
    # VUSE <.MEM_1725>
    _1657 = *_1656;
    if (_1657 <= ci_1918)
      goto <bb 384>; [94.50%]
    else
      goto <bb 388>; [5.50%]

  }
}

(gdb) p debug_bb_n_slim (388)
;; basic block 388, loop depth 1
;;  pred:       387
# ci_x_1703 = PHI <ci_x_1920(387)>
# _2327 = PHI <_1651(387)>
# ci_y_2179 = PHI <ci_y_1924(387)>
if (_333 != 0)
  goto <bb 62>; [50.00%]
else
  goto <bb 61>; [50.00%]
;;  succ:       61
;;              62

(gdb) p debug_bb_n_slim (385)
;; basic block 385, loop depth 2
;;  pred:       384
_1646 = ci_x_1920 + 1;
;;  succ:       386

(gdb) p debug_bb_n_slim (386)
;; basic block 386, loop depth 2
;;  pred:       382
;;              385
# ci_x_1920 = PHI <ci_x_1919(382), _1646(385)>
# ci_y_1923 = PHI <ci_y_1922(382), 0(385)>
_1650 = _1649 * ci_x_1920;
;;  succ:       387

(gdb) p debug (pre_header)
<edge (386 -> 387)>

So the reduction values look sane, and the vector code looks sane, I'll instead
focus first on values that change during the loop

Reply via email to