Re: Vectorizing HIRLAM 4: complicated access patterns examined.

Dorit Naishlos Sun, 23 Oct 2005 03:59:50 -0700


It looks like maybe a 64bit scalar-evolution issue - when I compile on
powerpc-linux with -m64, I also get the
"vect4.f:4: note: not consecutive access"
message.
This problem looks very similar to PR18403 which has been resolved a while
ago:

When compiling for 32bit, we get the following representation for the loop:
      # i_2 = PHI <i_25(11), i_41(14)>;
      <L12>:;
      D.505_38 = i_2 + -1;
      D.506_39 = (*b_14)[D.505_38];
      (*a_9)[D.505_38] = D.506_39;
      i_41 = i_2 + 1;
      if (i_2 == D.489_27) goto <L26>; else goto <L27>;

When compiling for 64bit, there is an extra cast:
      # i_2 = PHI <i_27(11), i_45(14)>;
      <L12>:;
      D.691_41 = (int8) i_2;
      D.692_42 = D.691_41 + -1;
      D.693_43 = (*b_16)[D.692_42];
      (*a_10)[D.692_42] = D.693_43;
      i_45 = i_2 + 1;
      if (i_2 == D.674_29) goto <L26>; else goto <L27>;

>From the vectorizer dumps for the 32bit code, it looks like the
access-function computed for the index to array b is quite simple and the
dataref analyzer can easily analyze it (i.e. extact the step - 4B - and
conclude that the access is consecutive):

Created dr for (*b_14)[D.708_38]
        base_address: b_14
        offset from base address: (<unnamed type>) (i_25 * 4 + -4)
        constant offset from base address: 0
        base_object: *b_14
        step: 4B
        base aligned 0
        misalignment from base:
        memtag: TMT.11

In the 64bit case however, the vectorizer dumps show that the
access-function returned for the index to array b is much more compilcated
- the dataref analyzer doesn't seem to be able to extract the
evolution/step in this case, and concludes that the access is
non-consecutive:

Created dr for (*b_16)[D.692_42]
        base_address: b_16
        offset from base address: (<unnamed type>) ((int8) {i_27, +, 1}_2 *
4 + -4)
        constant offset from base address: 0
        base_object: *b_16
        step: 0B
        base aligned 0
        misalignment from base:
        memtag: TMT.11

Should we reopen PR18403?

thanks,
dorit


Toon Moene <[EMAIL PROTECTED]> wrote on 22/10/2005 12:18:57:

>    This one gets vectorized for me, on powerpc-linux:
>
>    ~/mainline_cvs/bin/gfortran -O3 -ftree-vectorize -maltivec
>    -ftree-vectorizer-verbose=4 -S hilaram4.f90
>
>    hilaram4.f90:4: note: Alignment of access forced using peeling.
>    hilaram4.f90:4: note: Vectorizing an unaligned access.
>    hilaram4.f90:4: note: LOOP VECTORIZED.
>    hilaram4.f90:7: note: vectorized 1 loops in function.
>
>    dorit
>
>
>    > L.S.,
>    >
>    > This code:
>    >
>    >       SUBROUTINE S(N)
>    >       DIMENSION A(N), B(N)
>    >       READ*,ISTART,ISTOP,B
>    >       DO I = ISTART, ISTOP
>    >          A(I) = B(I)
>    >       ENDDO
>    >       PRINT*,A
>    >       END
>    >
>    > when compiled thusly:
>    >
>    > $ gfortran -g -S -O3 -ftree-vectorize -ftree-vectorizer-verbose=2 -
>    > msse2 vect4.f
>    >
>    > draws the following "not vectorized" message:
>    >
>    > vect4.f:4: note: not vectorized: complicated access pattern.
>    > vect4.f:4: note: vectorized 0 loops in function.
>
> I get the following, using the autovect branch compiler on
> x86_64-unknown-linux-gnu:
>
> vect4.f:4: note: ===== analyze_loop_nest =====
> vect4.f:4: note: === vect_analyze_loop_form ===
> vect4.f:4: note: split exit edge.
> vect4.f:4: note: === get_loop_niters ===
> vect4.f:4: note: ==> get_loop_niters:(<unnamed type>) (D.813_29 - i_27) +
1
> vect4.f:4: note: Symbolic number of iterations is (<unnamed type>)
> (D.813_29 - i_27) + 1
> vect4.f:4: note: === vect_analyze_data_refs ===
> vect4.f:4: note: get vectype with 4 units of type real4
> vect4.f:4: note: vectype: vector real4
> vect4.f:4: note: get vectype with 4 units of type real4
> vect4.f:4: note: vectype: vector real4
> vect4.f:4: note: === vect_analyze_scalar_cycles ===
> vect4.f:4: note: Analyze phi: HEAP.49_363 = PHI <HEAP.49_380(5),
> HEAP.49_381(7)>;
> vect4.f:4: note: virtual phi. skip.
> vect4.f:4: note: Analyze phi: HEAP.42_259 = PHI <HEAP.42_268(5),
> HEAP.42_268(7)>;
> vect4.f:4: note: virtual phi. skip.
> vect4.f:4: note: Analyze phi: HEAP.35_60 = PHI <HEAP.35_312(5),
> HEAP.35_312(7)>;
> vect4.f:4: note: virtual phi. skip.
> vect4.f:4: note: Analyze phi: i_1 = PHI <i_27(5), i_45(7)>;
> vect4.f:4: note: Access function of PHI: {i_27, +, 1}_2
> vect4.f:4: note: step: 1,  init: i_27
> vect4.f:4: note: Detected induction.
> vect4.f:4: note: === vect_pattern_recog ===
> vect4.f:4: note: === vect_mark_stmts_to_be_vectorized ===
> vect4.f:4: note: init: phi relevant? HEAP.49_363 = PHI <HEAP.
> 49_380(5), HEAP.49_381(7)>;
> vect4.f:4: note: init: phi relevant? HEAP.42_259 = PHI <HEAP.
> 42_268(5), HEAP.42_268(7)>;
> vect4.f:4: note: init: phi relevant? HEAP.35_60 = PHI <HEAP.
> 35_312(5), HEAP.35_312(7)>;
> vect4.f:4: note: init: phi relevant? i_1 = PHI <i_27(5), i_45(7)>;
> vect4.f:4: note: init: stmt relevant? <L14>:
> vect4.f:4: note: init: stmt relevant? D.830_39 = (int8) i_1
> vect4.f:4: note: init: stmt relevant? D.831_40 = D.830_39 + -1
> vect4.f:4: note: init: stmt relevant? D.832_43 = (*b_10)[D.831_40]
> vect4.f:4: note: init: stmt relevant? (*a_16)[D.831_40] = D.832_43
> vect4.f:4: note: vec_stmt_relevant_p: stmt has vdefs.
> vect4.f:4: note: mark relevant 2, live 0.
> vect4.f:4: note: init: stmt relevant? i_45 = i_1 + 1
> vect4.f:4: note: init: stmt relevant? if (i_1 == D.813_29) goto
> <L35>; else goto <L29>;
> vect4.f:4: note: init: stmt relevant? <L29>:
> vect4.f:4: note: worklist: examine stmt: (*a_16)[D.831_40] = D.832_43
> vect4.f:4: note: vect_is_simple_use: operand D.832_43
> vect4.f:4: note: def_stmt: D.832_43 = (*b_10)[D.831_40]
> vect4.f:4: note: type of def: 2.
> vect4.f:4: note: worklist: examine use 2: D.832_43
> vect4.f:4: note: mark relevant 2, live 0.
> vect4.f:4: note: worklist: examine stmt: D.832_43 = (*b_10)[D.831_40]
> vect4.f:4: note: === vect_analyze_data_refs_alignment ===
> vect4.f:4: note: vect_compute_data_ref_alignment:
> vect4.f:4: note: Unknown alignment for access: *b_10
> vect4.f:4: note: vect_compute_data_ref_alignment:
> vect4.f:4: note: Unknown alignment for access: *a_16
> vect4.f:4: note: === vect_determine_vectorization_factor ===
> vect4.f:4: note: ==> examining statement: <L14>:
> vect4.f:4: note: skip.
> vect4.f:4: note: ==> examining statement: D.830_39 = (int8) i_1
> vect4.f:4: note: skip.
> vect4.f:4: note: ==> examining statement: D.831_40 = D.830_39 + -1
> vect4.f:4: note: skip.
> vect4.f:4: note: ==> examining statement: D.832_43 = (*b_10)[D.831_40]
> vect4.f:4: note: vectype: vector real4
> vect4.f:4: note: nunits = 4
> vect4.f:4: note: ==> examining statement: (*a_16)[D.831_40] = D.832_43
> vect4.f:4: note: vectype: vector real4
> vect4.f:4: note: nunits = 4
> vect4.f:4: note: ==> examining statement: i_45 = i_1 + 1
> vect4.f:4: note: skip.
> vect4.f:4: note: ==> examining statement: if (i_1 == D.813_29) goto
> <L35>; else goto <L29>;
> vect4.f:4: note: skip.
> vect4.f:4: note: ==> examining statement: <L29>:
> vect4.f:4: note: skip.
> vect4.f:4: note: === vect_determine_vectorization_factor ===
> vect4.f:4: note: ==> examining statement: <L14>:
> vect4.f:4: note: skip.
> vect4.f:4: note: ==> examining statement: D.830_39 = (int8) i_1
> vect4.f:4: note: skip.
> vect4.f:4: note: ==> examining statement: D.831_40 = D.830_39 + -1
> vect4.f:4: note: skip.
> vect4.f:4: note: ==> examining statement: D.832_43 = (*b_10)[D.831_40]
> vect4.f:4: note: vectype: vector real4
> vect4.f:4: note: nunits = 4
> vect4.f:4: note: ==> examining statement: (*a_16)[D.831_40] = D.832_43
> vect4.f:4: note: vectype: vector real4
> vect4.f:4: note: nunits = 4
> vect4.f:4: note: ==> examining statement: i_45 = i_1 + 1
> vect4.f:4: note: skip.
> vect4.f:4: note: ==> examining statement: if (i_1 == D.813_29) goto
> <L35>; else goto <L29>;
> vect4.f:4: note: skip.
> vect4.f:4: note: ==> examining statement: <L29>:
> vect4.f:4: note: skip.
> vect4.f:4: note: === vect_analyze_dependences ===
> vect4.f:4: note: === vect_analyze_data_ref_accesses ===
> vect4.f:4: note: not consecutive access
>                  ^^^^^^^^^^^^^^^^^^^^^^ This is incorrect.  The accesses
*are*
>                                         consecutive; it's just that there
is
>                                         a "jump" at the beginning.
> vect4.f:4: note: not vectorized: complicated access pattern.
> vect4.f:4: note: bad data access.
>
> Hope this helps.
>
> Kind regards,
>
> --
> Toon Moene - e-mail: [EMAIL PROTECTED] - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> A maintainer of GNU Fortran 95: http://gcc.gnu.org/fortran/
Re: Vectorizing HIRLAM 4: complicated access patterns examined.

Reply via email to