On Tue, Aug 04, 2009 at 12:35:15PM +0300, Dorit Nuzman wrote: > > Hi Jack, > AFAIK this topic has not been addressed (closest thing is Richard > Guenther's work on versioning unknown strides > http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01174.html) and I don't know > about the prospects of this being done for gcc4.5 (I'd definitely lobby for > it... do you know which of the polyhedron benchmarks have this issue, > and/or which other benchmarks suffer from it?) > Thanks for the reminder... > dorit >
Dorit, While I don't have the list of effected benchmarks handy, a testcase which Tobias Burnus made is discussed in the previous email below. I believe the issue was supposed to be pretty wide spread in the benchmarks. Jack > > Tobias Burnus <bur...@net-b.de> wrote on 13/08/2007 22:58:54: > > > Hi Dorit and Jack, > > > > Dorit Nuzman wrote: > > > I actually wouldn't expect much improvement yet, as this initial > version > > > still has quite a few limitations. But if there are loops that you > expect > > > to get vectorized that do not, I'd be happy to take a look. > > > > > If something can be improved in the frontend, please tell us/the > > gfortraners. > > > > Neither of the two loops get vectorized; the first one should be easier > > than the second one: > > > > subroutine sub(aa,bb,n,m) > > implicit none > > integer, intent(in) :: n,m > > real, intent(inout) :: aa(n,m) > > real, intent(in) :: bb(n,m) > > integer :: i,j > > do i = 1,m > > do j= 2,n > > aa(i,j)= aa(i,j-1)+bb(i,j-1) > > enddo > > enddo > > do j= 2,n > > do i = 1,m > > aa(i,j)= aa(i,j-1)+bb(i,j-1) > > enddo > > enddo > > end subroutine > > end > > > > First loop: > > a.f90:8: note: not vectorized: data ref analysis failed D.1462_61 = > > (*aa_60(D))[D.1461_59] > > Here, AFAIU, the inner-loop accesses non-consecutive data, whereas the > outer-loop accesses consecutive data. So the only way to vectorize this > loop is either to interchange the loops, or to do outer-loop vectorization > (unless we want to gather disjoint elements in each vectorized inner-loop > iteration, which probably we don't). What fails the outer-loop vectorizer > (i.e. when attempting to vectorize the loop in line 7) is the fact that the > inner-loop bound is unknown, in which case the compiler creates a > guard-code before the inner-loop to decide whether to skip it or not, which > makes the structure of the outer-loop more complicated than is currently > unhandled. (This is also what I referred to here: > http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00743.html). > So we get: > a.f90:7: note: not vectorized: too many BBs in loop. > To work around that (just to see how far we can get) I put in constant > bound for the inner-loop. Now outer-loop vectorization fails with: > a.f90:7: note: not vectorized: data ref analysis failed D.1444_64 = > (*aa_63(D))[D.1443_62] > a.f90:7: note: bad data references. > This is because the stride in the inner-loop is unknown (because I guess > the dimention size is unknown). You can see that the data-ref analyzer > reports: > "failed: evolution of offset is not affine." > I'll bring it up with Sebastian and Zdenek, see what they think can be done > within the data-ref analyzer in this resepct. > > After outer-loop vectorization fails we move on to vectorize the loop in > line 8, and fail with the same data-ref analysis problem (unknown stride). > So for now, we need a testcase with a known inner-loop bound and known > dimentions/strides. > However, even when we are able to overcome all the above, we still won't be > able to vectorize the loop unless the inner-loop execution order is > reversed (otherwise there's a true dependnece from a(i,j) to a(i,j-1)). > ifort probably has loop reversal capabilities. We don't reverse loops yet > in the middle-end, and also we don't vectorize loops with a reverse access > yet. > > > Second loop: > > a.f90:13: note: Unknown alignment for access: *aa_60(D) > > a.f90:13: note: Unknown alignment for access: *bb_68(D) > > a.f90:13: note: Unknown alignment for access: *aa_60(D) > > a.f90:13: note: not vectorized: can't determine dependence between > > (*aa_60(D))[D.1461_59] and (*aa_60(D))[D.1456_53] > > > > Here, the outer-loop (in line 12) accesses non-consecutive data, whereas > the inner-loop (in line 13) accesses consecutive data, so the inner-loop is > a better candidate for vectorization. At first, the outer-loop vectorizer > fails with: > a.f90:12: note: not vectorized: too many BBs in loop. > (same as above), but after I change the inner-loop bound to a constnat it > fails with: > a.f90:12: note: evolution of offset is not affine. > (again, same problem as above, because the outer-loop stride is unknown). > > Moving on to the loop at line 13, we get: > a.f90:13: note: not vectorized: can't determine dependence between > (*aa_63(D))[D.1443_91] and (*aa_63(D))[D.1438_85] > a.f90:13: note: bad data dependence. > > This is because the data-dependence analysis fails with the following > message: > " > (compute_affine_dependence > (stmt_a = > D.1444_92 = (*aa_63(D))[D.1443_91]) > (stmt_b = > (*aa_63(D))[D.1438_85] = D.1449_100) > (subscript_dependence_tester > (analyze_overlapping_iterations > (chrec_a = {pretmp.62_120 + 1, +, 1}_4) > (chrec_b = {pretmp.60_49 + 1, +, 1}_4) > (analyze_siv_subscript > siv test failed: unimplemented. > ) > (overlap_iterations_a = not known > ) > (overlap_iterations_b = not known > ) > ) > (dependence classified: scev_not_known) > ) > ) > " > There's already an open PR for this. I'll ping Sebastian about this issue. > > > ifort vectorizes both loops. > > > > > > The following loop is also vectorized with ifort; it is from the > > Polyhedron test suite (ac.f90); exchanging the ix and iy loop does not > > change anything. > > > > SUBROUTINE SUSCEP(L,Iz) > > IMPLICIT NONE > > INTEGER L , Iz(L,L) , iznum, ix, iy > > iznum = 0 > > DO iy = 1 , L > > DO ix = 1 , L > > iznum = iznum + Iz(iy,ix) > > ENDDO > > ENDDO > > END subroutine > > end > > > > gcc only prints: > > test2.f90:1: note: vectorized 0 loops in function. > > the reason you don't see anything here is that GCC optimizes the loop away. > To work around that I added a print of iznum at the end of the function > (and indeed now you can see that the loop survives and the vectorizer > analyzes it). With this change only, the vectorizer reports: > failed: evolution of offset is not affine. > (same problem as above - the stride is unknown cause the dimention size is > unknown). > > If I interchange the loops I get: > b.f90:6: note: Analyze phi: iznum_lsm.74_31 = PHI > <iznum_lsm.74_32(4), iznum_lsm.74_12(6)> > b.f90:6: note: reduction: not commutative/associative: iznum.10_37 > tobias2b.f90:6: note: Unknown def-use cycle pattern. > ... > b.f90:6: note: worklist: examine stmt: iznum.9_36 = iznum_lsm.74_31 > b.f90:6: note: vect_is_simple_use: operand iznum_lsm.74_31 > b.f90:6: note: def_stmt: iznum_lsm.74_31 = PHI <iznum_lsm.74_32(4), > iznum_lsm.74_12(6)> > b.f90:6: note: Unsupported pattern. > b.f90:6: note: not vectorized: unsupported use in stmt. > 2b.f90:6: note: unexpected pattern. > > This happens because we get the following pattern: > # iznum_lsm.74_31 = PHI <iznum_lsm.74_32(4), iznum_lsm.74_12(6)> > ... > iznum.9_36 = iznum_lsm.74_31; > iznum.10_37 = D.1420_35 + iznum.9_36; > iznum_lsm.74_12 = iznum.10_37; > ... > > which is a "complicated" way to write: > # iznum_lsm.74_31 = PHI <iznum_lsm.74_32(4), iznum_lsm.74_12(6)> > ... > iznum_lsm.74_12 = D.1420_35 + iznum_lsm.74_31; > ... > > In other words, the extra assignments confuse the vectorizer. This is a > known problem - we need to extend our reduction detection code to detect > more "complicated" def-use cycles than just the trivial one. I hope this > will get done in the near future... (I'll add this testcase to the relevant > PR). > > Thanks for your feedback, it helps to prioritize the million items on our > todo list... > > dorit > >