[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

changpeng dot fang at amd dot com Mon, 23 Aug 2010 17:04:07 -0700


------- Comment #2 from changpeng dot fang at amd dot com  2010-08-24 00:03 
-------
float f (float *x, float *y, float *z, unsigned n)
{
  float ret = 0.0;
  unsigned i;
  for (i = 0; i < n; i++)
    {
      float diff = x[i] - y[i];
      ret -= diff * diff * z[i];
    }
  return ret;
}


NO, this is related tp PR 45022 in certain sense, but the underlying
reason is yet unknown.

For the above test case, if I compile with -O3 -march=amdfam10 -m64,
the loop is not vectorized due to floating point reduction. To my
surprise, no prefetch is generated. The cost model filtered out the 
prefetches (we are trying to prefetch for each of the three memory
references):
Ahead 15, unroll factor 1, trip count -1
insn count 14, mem ref count 3, prefetch count 3
Not prefetching -- instruction to prefetch ratio (4) too small

However, if we compile with -O3 -ffast-math -march=amdfam10 -m64,
the loop can be vectorized, and one of the array reference is 
aligned. As a result and due to PR 45022, we are trying to prefetch
only for the aligned reference, and one prefetch is inserted (this
time, insns-to-prefetch ratio is big enough).

The Fix of PR 45022 will result in NO prefetch generated actually and thus
hide the problem.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45391

[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

Reply via email to