[Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi

rguenth at gcc dot gnu.org Mon, 03 Feb 2014 05:53:04 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042


--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
With some more dumping I seee

himenobmtxpa.c:296:9: note: === vect_prune_runtime_alias_test_list ===
himenobmtxpa.c:296:9: note: merging ranges for *_205, *_324 and *_49, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_205, *_324 and *_192, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_168, *_324 and *_69, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_168, *_324 and *_154, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_265, *_324 and *_296, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_265, *_324 and *_89, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_174, *_324 and *_248, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_174, *_324 and *_161, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_211, *_324 and *_231, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_211, *_324 and *_199, *_324
himenobmtxpa.c:296:9: note: improved number of alias checks from 31 to 21

and

Creating dr for *_205
analyze_innermost: success.
        base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1009
* 4)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1009
* 4)
        Access function 0: {0B, +, 4}_7
Creating dr for *_168
analyze_innermost: success.
        base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1023
* 4)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1023
* 4)
        Access function 0: {0B, +, 4}_7
Creating dr for *_265
analyze_innermost: success.
        base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1034
* 4)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1034
* 4)
        Access function 0: {0B, +, 4}_7
Creating dr for *_174
analyze_innermost: success.
        base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1063
* 4)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1063
* 4)
        Access function 0: {0B, +, 4}_7
...

so the remaining DDRs against *_324 all look related.

  pretmp_1062 = pretmp_1020 + pretmp_1047;
  pretmp_1063 = _25 * pretmp_1062;

  pretmp_1033 = j_380 + pretmp_1020;
  pretmp_1034 = _25 * pretmp_1033;

  pretmp_1022 = pretmp_1020 + pretmp_1021;
  pretmp_1023 = _25 * pretmp_1022;

but SCEV doesn't expand stmts before the loop and thus doesn't see this.
It's obviously far from trivial to merge segments with symbolic start
addresses ... these are multi-dimensional accesses:

        for(k=1 ; k<kmax ; k++){
          s0= MR(a,0,i,j,k)*MR(p,0,i+1,j,  k)
            + MR(a,1,i,j,k)*MR(p,0,i,  j+1,k)
            + MR(a,2,i,j,k)*MR(p,0,i,  j,  k+1)
            + MR(b,0,i,j,k)
             *( MR(p,0,i+1,j+1,k) - MR(p,0,i+1,j-1,k)
              - MR(p,0,i-1,j+1,k) + MR(p,0,i-1,j-1,k) )
            + MR(b,1,i,j,k)
             *( MR(p,0,i,j+1,k+1) - MR(p,0,i,j-1,k+1)
              - MR(p,0,i,j+1,k-1) + MR(p,0,i,j-1,k-1) )
            + MR(b,2,i,j,k)
             *( MR(p,0,i+1,j,k+1) - MR(p,0,i-1,j,k+1)
              - MR(p,0,i+1,j,k-1) + MR(p,0,i-1,j,k-1) )
            + MR(c,0,i,j,k) * MR(p,0,i-1,j,  k)
            + MR(c,1,i,j,k) * MR(p,0,i,  j-1,k)
            + MR(c,2,i,j,k) * MR(p,0,i,  j,  k-1)
            + MR(wrk1,0,i,j,k);

          ss= (s0*MR(a,3,i,j,k) - MR(p,0,i,j,k))*MR(bnd,0,i,j,k);

          gosa+= ss*ss;
          MR(wrk2,0,i,j,k)= MR(p,0,i,j,k) + omega*ss;
        }

and we manage to merge the fastest varying dimension +-1 ones AFAIK,
but not for example the ones for MR(p,0,i+1,j+1,k) and MR(p,0,i+1,j-1,k).
Ideally we would be able to derive a single check for each array
(which would require analyzing the DRs in the outer loops as well to
gather info about the other dimensions).

[Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi

Reply via email to