> Hi,
>
> I've noticed that GCC doesn't like to vectorize my loop.
>
> 1. When the loop has non-unit stride, I get 'complicated access pattern'
> message. Are non-unit strides supported?
>
> res(1:nS) = grid(1:(43-1)*7+1:7)*Dummy ! COMPLICATED ACCESSPATTERN
>

Currently only power-of-2 strides are supported. The vectorizer dump file
(when using -fdump-tree-vect-details) shows that the step in this case is
28 bytes:
 "
        base_address: grid_17(D)
        offset from base address: 0
        constant offset from base address: 0
        step: 28
        aligned to: 128
        base_object: (*grid_17(D))[0]
        symbol tag: SMT.17
"

> 2. When a stride is not a compile time constant, then I get 'data ref'
> error upon vectorization, instead of 'complicated access pattern'.
>
> res(1:nS) = grid(1: (nS-1)*iNew+1 : iNew)*Dummy !NOT VECTORIZED DATA REF
>

This is because vectorization analysis fails earlier, during
data-references analysis - we are currently unable to represent data
references whose step is not a compile time known constant. Again from the
vectorizer dump file:
"
t.f90:29: note: === vect_analyze_data_refs ===
Creating dr for (*grid_15(D))[D.996_14]
...
failed: evolution of offset is not affine.
...
        base_address:
        offset from base address:
        constant offset from base address:
        step:
        aligned to:
        base_object: (*grid_15(D))[0]
        symbol tag: SMT.15
...
t.f90:29: note: not vectorized: data ref analysis failed D.997_16 =
(*grid_15(D))[D.996_14]
t.f90:29: note: bad data references.
".

(This issue was actually discussed in the past - see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33113).


> This is a part of more complicated loop that is otherwise fully
> vectorizable, and I wonder is it some minor issue so that fix could be
> tried, or is it something fundamental that would never allow the loop to
> be vectorized.
>

non-power-of-2 strides may be supported one day - it's just that it's
generally cheaper to support power-of-2 strides with most SIMD platforms,
but in principle it could be done. Unknown strides are more tricky - we'd
need to do something like a "gather" operation, which is generally not
available on common SIMD platforms.

dorit

>
---------------------------------------------------------------------------
> $ cat -> test_nice.for
> PROGRAM prog
>   INTEGER, PARAMETER :: N = 1000, nS = 43
>   INTEGER :: iS
>
>   REAL(4) :: grid(N)
>   REAL(4) :: res(nS)
>
>   EXTERNAL test
>
>   res = 0
>   DO iS = 1, N
>     grid(iS) = SIN(REAL(is)/N)
>   END DO
>
>   DO iS = 2, 7
>     CALL test(iS, grid, 1.2*iS, res)
>     PRINT *, res
>   END DO
>
> END PROGRAM
>
>   SUBROUTINE test(iNew, grid, Dummy, res)
>     INTEGER, PARAMETER   :: nS = 43
>     INTEGER, INTENT(in)  :: iNew
>     REAL(4), INTENT(in)  :: grid(*)
>     REAL(4), INTENT(in)  :: Dummy
>     REAL(4), INTENT(out) :: res(*)
>
>     res(1:nS) = grid(1: (nS-1)*iNew+1 : iNew)*Dummy !NOT VECTORIZED DATA
> REF
>     res(1:nS) = grid(1:(43-1)*7+1:7)*Dummy       ! COMPLICATED ACCESS
> PATTERN
>     res(1:nS) = grid(1:nS:1)*Dummy               ! VECTORIZED
>   END SUBROUTINE
>
>
>
> $ gfortran -O2  -fno-backslash -ftree-vectorize
> -ftree-vectorizer-verbose=2  -ffast-math -msse4 test_nice.f90 -o
> test_nice.exe
>
> test_nice.f90:31: note: LOOP VECTORIZED.
> test_nice.f90:30: note: not vectorized: complicated access pattern.
> test_nice.f90:29: note: not vectorized: data ref analysis failed
> D.1037_18 = (*grid_17(D))[D.1036_16]
> test_nice.f90:22: note: vectorized 1 loops in function.
>
> test_nice.f90:11: note: not vectorized: relevant stmt not supported:
> D.1008_7 = __builtin_sinf (D.1007_6)
> test_nice.f90:1: note: vectorized 0 loops in function.
> --
>   Cheers
>     Michal
>

Reply via email to