[Bug tree-optimization/57204] Auto-vectorization in nested loops with non-varying indexed array access results in very poor performance (worse than no auto-vectorization)

snagavallis at outlook dot com Wed, 08 May 2013 17:20:08 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57204


--- Comment #2 from Sasanka Nagavalli <snagavallis at outlook dot com> 
2013-05-09 00:19:50 UTC ---
Created attachment 30067
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30067
Test case 2 for issue 57204

The original test and description are slightly misleading because they could be
interpreted to suggest that the compiler should generate the same optimized
code for the good case and bad case. This was not the intent of providing those
two tests. The two tests may give the same result algorithmically if all
elements of d are non-negative, but the actual operations can be different.
Consider the following:

Good:
k=0: ...
k=1:
  i=0:
    d_ik = d[1] 
    j=0: ...
    j=1:
      t = d_ik + d[n+1]
      d[1] = MIN(d[1],t)
    j=2:
      t = d_ik + d[n+2]      <--- This is different
      d[2] = MIN(d[2],t)
    ...

Bad:
k=0: ...
k=1:
  i=0:
    j=0: ...
    j=1:
      t = d[1] + d[n+1]
      d[1] = MIN(d[1],t)
    j=2:
      t = d[1] + d[n+2]      <--- This is different
      d[2] = MIN(d[2],t)
    ...

Attached a second set of tests to more clearly demonstrate the issue. The
following cases result in the same operations, but the compiler still generates
worse code for the bad case than the good case. In the bad case, it once again
generates slower code with auto-vectorization than without auto-vectorization. 

New Good:
void foo(float * d, int n)
{
  int i, j, k;
  for (k=0; k<n; ++k) {
    for (i=0; i<n; ++i) {
      for (j=0; j<k; ++j) {
        float t = d[i*n+k] + d[k*n+j];
        d[i*n+j] = (d[i*n+j] < t) ? d[i*n+j] : t;
      }
      for (j=k; j<n; ++j) {
        float t = d[i*n+k] + d[k*n+j];
        d[i*n+j] = (d[i*n+j] < t) ? d[i*n+j] : t;
      }
    }
  }
}

[Bug tree-optimization/57204] Auto-vectorization in nested loops with non-varying indexed array access results in very poor performance (worse than no auto-vectorization)

Reply via email to