https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105771
Michael Matz <matz at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |matz at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #9 from Michael Matz <matz at gcc dot gnu.org> --- (thanks for the ping, Sam) Hmpf. It's again one of these cases where our distance vectors for data dependences are simply wrong; that's a very long standing problem :-( The loop-nest in question looks like this (slightly rewriting to have only one read and one unconditional write): for (i) for (j) t = i<=j ? src[i][j] : 0; // R dst[j][i] = t; // W Let's assume we have only two elems per dimension, then the iteration order and accesses are i j R W 0 0 0,0 0,0 0 1 0,1 1,0 1 0 1,0 0,1 1 1 1,1 1,1 So, element (0,1) is read in iteration 0,1 and written in iteration 1,0. unroll-and-jam maps iteration (a,b) to (a/N,b) (and leaves order of evaluations otherwise same). So, with an unroll factor of 2, the write to (0,1) is now done in iteration 0,0, which is before iteration 0,1, and hence the element is now written before read, boom. The unroll-jam code uses distance vectors to compute safety of the transformation. When the distance vectors are lexicographically positive before and after the transformation then the transformation is correct semantically. Here we have the write-after-read dependence occur in iterations 0,1 and 1,0 so the distance vector is (1,-1). That's lexicographically positive. After the unroll-jam transformation the new distance vector would be (0,-1), which is lexicographically negative, and hence would be rejected by the appropriate test in adjust_unroll_factor() (or rather, not rejected, but the unroll factor would be adjust to "1", and hence the transformation effectively wouldn't be done). Now, look at this in the dump file: (Data Dep: #(Data Ref: # bb: 4 # stmt: iftmp.0_12 = (*src_11(D))[i_17][j_19].v; # ref: (*src_11(D))[i_17][j_19].v; # base_object: (*src_11(D)); # Access function 0: 0 # Access function 1: {0, +, 1}_2 # Access function 2: {0, +, 1}_1 #) #(Data Ref: # bb: 5 # stmt: (*_3)[i_17].v = iftmp.0_6; # ref: (*_3)[i_17].v; # base_object: (*dest_13(D)); # Access function 0: 0 # Access function 1: {0, +, 1}_1 # Access function 2: {0B, +, 68}_2 #) access_fn_A: 0 access_fn_B: 0 (subscript iterations_that_access_an_element_twice_in_A: [0] last_conflict: scev_not_known iterations_that_access_an_element_twice_in_B: [0] last_conflict: scev_not_known (Subscript distance: 0 )) loop nest: (1 2 ) distance_vector: 0 0 distance_vector: 1 0 distance_vector: 0 1 direction_vector: = = direction_vector: + = direction_vector: = + ) No distance vector (1,-1) anywhere. All the distance vectors (per subscribt/access function) that are there are and remain non-negative after the unroll-jam transformation, so as far as the unroll-jam code is concerned the transformation is valid. (Note that all the above arguments only apply when src==dst, otherwise the distance between them of course influences in which iteration the same elements are accessed; making this explicit in the sources by s/dest/src/ doesn't change the picture here, the same problematic distance vectors are computed). I've seen this in the past already when I thought, uhhh, these distance vectors in our data-dep-relations often look wonky. The code to compute them is very old, and contains some peculiar things that I think can't be quite right in multi-loop nests when multiple subscripts are involved (at least). But it's also relatively complex, and so we've often just worked around issues with them in the clients that use them (which aren't that many (un)fortunately, so bugs there remain seldom and in corner cases). I'll try (again) to perhaps make sense of our distance vector compute code, and see what can be done about this.