https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105771

Michael Matz <matz at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |matz at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #9 from Michael Matz <matz at gcc dot gnu.org> ---
(thanks for the ping, Sam)

Hmpf.  It's again one of these cases where our distance vectors for data
dependences are simply wrong; that's a very long standing problem :-(

The loop-nest in question looks like this (slightly rewriting to have only
one read and one unconditional write):

  for (i)
    for (j)
      t = i<=j ? src[i][j] : 0;   // R
      dst[j][i] = t;              // W

Let's assume we have only two elems per dimension, then the iteration order
and accesses are 

  i j  R      W
  0 0  0,0    0,0
  0 1  0,1    1,0
  1 0  1,0    0,1
  1 1  1,1    1,1

So, element (0,1) is read in iteration 0,1 and written in iteration 1,0.
unroll-and-jam maps iteration (a,b) to (a/N,b) (and leaves order of evaluations
otherwise same).  So, with an unroll factor of 2, the write to (0,1) is now
done in iteration 0,0, which is before iteration 0,1, and hence the element
is now written before read, boom.

The unroll-jam code uses distance vectors to compute safety of the
transformation.  When the distance vectors are lexicographically positive
before and after the transformation then the transformation is correct
semantically.

Here we have the write-after-read dependence occur in iterations 0,1 and 1,0
so the distance vector is (1,-1).  That's lexicographically positive.
After the unroll-jam transformation the new distance vector would be (0,-1),
which is lexicographically negative, and hence would be rejected by the
appropriate test in adjust_unroll_factor() (or rather, not rejected, but
the unroll factor would be adjust to "1", and hence the transformation
effectively wouldn't be done).

Now, look at this in the dump file:

(Data Dep: 
#(Data Ref: 
#  bb: 4 
#  stmt: iftmp.0_12 = (*src_11(D))[i_17][j_19].v;
#  ref: (*src_11(D))[i_17][j_19].v;
#  base_object: (*src_11(D));
#  Access function 0: 0
#  Access function 1: {0, +, 1}_2
#  Access function 2: {0, +, 1}_1
#)
#(Data Ref: 
#  bb: 5 
#  stmt: (*_3)[i_17].v = iftmp.0_6;
#  ref: (*_3)[i_17].v;
#  base_object: (*dest_13(D));
#  Access function 0: 0
#  Access function 1: {0, +, 1}_1
#  Access function 2: {0B, +, 68}_2
#)
  access_fn_A: 0
  access_fn_B: 0

 (subscript 
  iterations_that_access_an_element_twice_in_A: [0]
  last_conflict: scev_not_known
  iterations_that_access_an_element_twice_in_B: [0]
  last_conflict: scev_not_known
  (Subscript distance: 0 ))
  loop nest: (1 2 )
  distance_vector: 0 0 
  distance_vector: 1 0 
  distance_vector: 0 1 
  direction_vector:     =    =
  direction_vector:     +    =
  direction_vector:     =    +
)

No distance vector (1,-1) anywhere.  All the distance vectors (per
subscribt/access function) that are there are and remain non-negative after
the unroll-jam transformation, so as far as the unroll-jam code is concerned
the transformation is valid.

(Note that all the above arguments only apply when src==dst, otherwise the
distance between them of course influences in which iteration the same elements
are accessed; making this explicit in the sources by s/dest/src/ doesn't change
the picture here, the same problematic distance vectors are computed).

I've seen this in the past already when I thought, uhhh, these distance vectors
in our data-dep-relations often look wonky.  The code to compute them is
very old, and contains some peculiar things that I think can't be quite right
in multi-loop nests when multiple subscripts are involved (at least).
But it's also relatively complex, and so we've often just worked around issues
with them in the clients that use them (which aren't that many (un)fortunately,
so bugs there remain seldom and in corner cases).

I'll try (again) to perhaps make sense of our distance vector compute code,
and see what can be done about this.

Reply via email to