14 Regression] wrong code at -O3 on x86_64-linux-gnu since r12-2097-g9f34b780b0461e

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 14 Nov 2023 02:38:57 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112281


--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
ldist distributes the loop nest where which looks like

  <bb 3> [local count: 118111600]:
  # c.6_23 = PHI <_4(9), 2(2)>
  # ivtmp_1 = PHI <ivtmp_27(9), 2(2)>
  _13 = c.6_23 + 1;

  <bb 4> [local count: 955630224]:
  # e.4_22 = PHI <_2(10), 0(3)>
  # ivtmp_28 = PHI <ivtmp_3(10), 2(3)>
  b = d[_13];
  d[c.6_23] = b;
  d[_13].a = 0;
  _2 = e.4_22 + 1;
  ivtmp_3 = ivtmp_28 - 1;
  if (ivtmp_3 != 0)
    goto <bb 10>; [89.00%]
  else
    goto <bb 5>; [11.00%]

  <bb 10> [local count: 850510900]:
  goto <bb 4>; [100.00%]

  <bb 5> [local count: 118111600]:
  _4 = c.6_23 + -1;
  ivtmp_27 = ivtmp_1 - 1;
  if (ivtmp_27 != 0)
    goto <bb 9>; [89.00%]
  else
    goto <bb 6>; [11.00%]

  <bb 9> [local count: 105119324]:
  goto <bb 3>; [100.00%]

so there is no evolution of the DR indices in the inner loop.  _4 is also
dead and is eliminated by the transform.  -fdisable-tree-ivcanon avoids
the dead code but doesn't change the fact we miscompile this.

One interesting fact is that we have

Creating dr for d[c.6_23]
...
        base_object: d
        Access function 0: {2, +, -1}_1

but

Creating dr for d[_13].a
...
        base_object: d
        Access function 0: 32
        Access function 1: {3, +, -1}_1

but we do seem to get along here, computing

(compute_affine_dependence
  ref_a: d[c.6_23], stmt_a: d[c.6_23] = b;
  ref_b: d[_13].a, stmt_b: d[_13].a = 0;
(analyze_overlapping_iterations
  (chrec_a = {2, +, -1}_1)
  (chrec_b = {3, +, -1}_1)
(analyze_siv_subscript
(analyze_subscript_affine_affine
  (overlaps_a = [0 + 1 * x_1])
  (overlaps_b = [1 + 1 * x_1]))
)
  (overlap_iterations_a = [0 + 1 * x_1])
  (overlap_iterations_b = [1 + 1 * x_1]))
(analyze_overlapping_iterations
  (chrec_a = {3, +, -1}_1)
  (chrec_b = {2, +, -1}_1)
(analyze_siv_subscript
(analyze_subscript_affine_affine
  (overlaps_a = [1 + 1 * x_1])
  (overlaps_b = [0 + 1 * x_1]))
)
  (overlap_iterations_a = [1 + 1 * x_1])
  (overlap_iterations_b = [0 + 1 * x_1]))
(build_classic_dist_vector
  dist_vector = (1 0
  )
)

(reversed)

OTOH for we compute the same distance for

(compute_affine_dependence
  ref_a: d[_13], stmt_a: b = d[_13];
  ref_b: d[c.6_23], stmt_b: d[c.6_23] = b;
(analyze_overlapping_iterations
  (chrec_a = {3, +, -1}_1)
  (chrec_b = {2, +, -1}_1)
(analyze_siv_subscript
(analyze_subscript_affine_affine
  (overlaps_a = [1 + 1 * x_1])
  (overlaps_b = [0 + 1 * x_1]))
)
  (overlap_iterations_a = [1 + 1 * x_1])
  (overlap_iterations_b = [0 + 1 * x_1]))
(analyze_overlapping_iterations
  (chrec_a = {2, +, -1}_1)
  (chrec_b = {3, +, -1}_1)
(analyze_siv_subscript
(analyze_subscript_affine_affine
  (overlaps_a = [0 + 1 * x_1])
  (overlaps_b = [1 + 1 * x_1]))
)
  (overlap_iterations_a = [0 + 1 * x_1])
  (overlap_iterations_b = [1 + 1 * x_1]))
(build_classic_dist_vector
  dist_vector = (1 0
  )
)

I think that

              /* If the overlap is exact preserve stmt order.  */
              else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0),
                                            DDR_NB_LOOPS (ddr)))
                ;

is not good enough.  If the dependence distance is zero between any two
iterations we have to preserve execution order.  That means the innermost
loop may not have zero dependence distance.  This was adjusted also for
PR87022.

[Bug tree-optimization/112281] [12/13/14 Regression] wrong code at -O3 on x86_64-linux-gnu since r12-2097-g9f34b780b0461e

Reply via email to