https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112281

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
ldist distributes the loop nest where which looks like

  <bb 3> [local count: 118111600]:
  # c.6_23 = PHI <_4(9), 2(2)>
  # ivtmp_1 = PHI <ivtmp_27(9), 2(2)>
  _13 = c.6_23 + 1;

  <bb 4> [local count: 955630224]:
  # e.4_22 = PHI <_2(10), 0(3)>
  # ivtmp_28 = PHI <ivtmp_3(10), 2(3)>
  b = d[_13];
  d[c.6_23] = b;
  d[_13].a = 0;
  _2 = e.4_22 + 1;
  ivtmp_3 = ivtmp_28 - 1;
  if (ivtmp_3 != 0)
    goto <bb 10>; [89.00%]
  else
    goto <bb 5>; [11.00%]

  <bb 10> [local count: 850510900]:
  goto <bb 4>; [100.00%]

  <bb 5> [local count: 118111600]:
  _4 = c.6_23 + -1;
  ivtmp_27 = ivtmp_1 - 1;
  if (ivtmp_27 != 0)
    goto <bb 9>; [89.00%]
  else
    goto <bb 6>; [11.00%]

  <bb 9> [local count: 105119324]:
  goto <bb 3>; [100.00%]

so there is no evolution of the DR indices in the inner loop.  _4 is also
dead and is eliminated by the transform.  -fdisable-tree-ivcanon avoids
the dead code but doesn't change the fact we miscompile this.

One interesting fact is that we have

Creating dr for d[c.6_23]
...
        base_object: d
        Access function 0: {2, +, -1}_1

but

Creating dr for d[_13].a
...
        base_object: d
        Access function 0: 32
        Access function 1: {3, +, -1}_1

but we do seem to get along here, computing

(compute_affine_dependence
  ref_a: d[c.6_23], stmt_a: d[c.6_23] = b;
  ref_b: d[_13].a, stmt_b: d[_13].a = 0;
(analyze_overlapping_iterations
  (chrec_a = {2, +, -1}_1)
  (chrec_b = {3, +, -1}_1)
(analyze_siv_subscript
(analyze_subscript_affine_affine
  (overlaps_a = [0 + 1 * x_1])
  (overlaps_b = [1 + 1 * x_1]))
)
  (overlap_iterations_a = [0 + 1 * x_1])
  (overlap_iterations_b = [1 + 1 * x_1]))
(analyze_overlapping_iterations
  (chrec_a = {3, +, -1}_1)
  (chrec_b = {2, +, -1}_1)
(analyze_siv_subscript
(analyze_subscript_affine_affine
  (overlaps_a = [1 + 1 * x_1])
  (overlaps_b = [0 + 1 * x_1]))
)
  (overlap_iterations_a = [1 + 1 * x_1])
  (overlap_iterations_b = [0 + 1 * x_1]))
(build_classic_dist_vector
  dist_vector = (1 0
  )
)

(reversed)

OTOH for we compute the same distance for

(compute_affine_dependence
  ref_a: d[_13], stmt_a: b = d[_13];
  ref_b: d[c.6_23], stmt_b: d[c.6_23] = b;
(analyze_overlapping_iterations
  (chrec_a = {3, +, -1}_1)
  (chrec_b = {2, +, -1}_1)
(analyze_siv_subscript
(analyze_subscript_affine_affine
  (overlaps_a = [1 + 1 * x_1])
  (overlaps_b = [0 + 1 * x_1]))
)
  (overlap_iterations_a = [1 + 1 * x_1])
  (overlap_iterations_b = [0 + 1 * x_1]))
(analyze_overlapping_iterations
  (chrec_a = {2, +, -1}_1)
  (chrec_b = {3, +, -1}_1)
(analyze_siv_subscript
(analyze_subscript_affine_affine
  (overlaps_a = [0 + 1 * x_1])
  (overlaps_b = [1 + 1 * x_1]))
)
  (overlap_iterations_a = [0 + 1 * x_1])
  (overlap_iterations_b = [1 + 1 * x_1]))
(build_classic_dist_vector
  dist_vector = (1 0
  )
)

I think that

              /* If the overlap is exact preserve stmt order.  */
              else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0),
                                            DDR_NB_LOOPS (ddr)))
                ;

is not good enough.  If the dependence distance is zero between any two
iterations we have to preserve execution order.  That means the innermost
loop may not have zero dependence distance.  This was adjusted also for
PR87022.

Reply via email to