> Hi, > > I've noticed that GCC doesn't like to vectorize my loop. > > 1. When the loop has non-unit stride, I get 'complicated access pattern' > message. Are non-unit strides supported? > > res(1:nS) = grid(1:(43-1)*7+1:7)*Dummy ! COMPLICATED ACCESSPATTERN >
Currently only power-of-2 strides are supported. The vectorizer dump file (when using -fdump-tree-vect-details) shows that the step in this case is 28 bytes: " base_address: grid_17(D) offset from base address: 0 constant offset from base address: 0 step: 28 aligned to: 128 base_object: (*grid_17(D))[0] symbol tag: SMT.17 " > 2. When a stride is not a compile time constant, then I get 'data ref' > error upon vectorization, instead of 'complicated access pattern'. > > res(1:nS) = grid(1: (nS-1)*iNew+1 : iNew)*Dummy !NOT VECTORIZED DATA REF > This is because vectorization analysis fails earlier, during data-references analysis - we are currently unable to represent data references whose step is not a compile time known constant. Again from the vectorizer dump file: " t.f90:29: note: === vect_analyze_data_refs === Creating dr for (*grid_15(D))[D.996_14] ... failed: evolution of offset is not affine. ... base_address: offset from base address: constant offset from base address: step: aligned to: base_object: (*grid_15(D))[0] symbol tag: SMT.15 ... t.f90:29: note: not vectorized: data ref analysis failed D.997_16 = (*grid_15(D))[D.996_14] t.f90:29: note: bad data references. ". (This issue was actually discussed in the past - see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33113). > This is a part of more complicated loop that is otherwise fully > vectorizable, and I wonder is it some minor issue so that fix could be > tried, or is it something fundamental that would never allow the loop to > be vectorized. > non-power-of-2 strides may be supported one day - it's just that it's generally cheaper to support power-of-2 strides with most SIMD platforms, but in principle it could be done. Unknown strides are more tricky - we'd need to do something like a "gather" operation, which is generally not available on common SIMD platforms. dorit > --------------------------------------------------------------------------- > $ cat -> test_nice.for > PROGRAM prog > INTEGER, PARAMETER :: N = 1000, nS = 43 > INTEGER :: iS > > REAL(4) :: grid(N) > REAL(4) :: res(nS) > > EXTERNAL test > > res = 0 > DO iS = 1, N > grid(iS) = SIN(REAL(is)/N) > END DO > > DO iS = 2, 7 > CALL test(iS, grid, 1.2*iS, res) > PRINT *, res > END DO > > END PROGRAM > > SUBROUTINE test(iNew, grid, Dummy, res) > INTEGER, PARAMETER :: nS = 43 > INTEGER, INTENT(in) :: iNew > REAL(4), INTENT(in) :: grid(*) > REAL(4), INTENT(in) :: Dummy > REAL(4), INTENT(out) :: res(*) > > res(1:nS) = grid(1: (nS-1)*iNew+1 : iNew)*Dummy !NOT VECTORIZED DATA > REF > res(1:nS) = grid(1:(43-1)*7+1:7)*Dummy ! COMPLICATED ACCESS > PATTERN > res(1:nS) = grid(1:nS:1)*Dummy ! VECTORIZED > END SUBROUTINE > > > > $ gfortran -O2 -fno-backslash -ftree-vectorize > -ftree-vectorizer-verbose=2 -ffast-math -msse4 test_nice.f90 -o > test_nice.exe > > test_nice.f90:31: note: LOOP VECTORIZED. > test_nice.f90:30: note: not vectorized: complicated access pattern. > test_nice.f90:29: note: not vectorized: data ref analysis failed > D.1037_18 = (*grid_17(D))[D.1036_16] > test_nice.f90:22: note: vectorized 1 loops in function. > > test_nice.f90:11: note: not vectorized: relevant stmt not supported: > D.1008_7 = __builtin_sinf (D.1007_6) > test_nice.f90:1: note: vectorized 0 loops in function. > -- > Cheers > Michal >