------- Comment #2 from changpeng dot fang at amd dot com 2010-08-24 00:03 ------- float f (float *x, float *y, float *z, unsigned n) { float ret = 0.0; unsigned i; for (i = 0; i < n; i++) { float diff = x[i] - y[i]; ret -= diff * diff * z[i]; } return ret; }
NO, this is related tp PR 45022 in certain sense, but the underlying reason is yet unknown. For the above test case, if I compile with -O3 -march=amdfam10 -m64, the loop is not vectorized due to floating point reduction. To my surprise, no prefetch is generated. The cost model filtered out the prefetches (we are trying to prefetch for each of the three memory references): Ahead 15, unroll factor 1, trip count -1 insn count 14, mem ref count 3, prefetch count 3 Not prefetching -- instruction to prefetch ratio (4) too small However, if we compile with -O3 -ffast-math -march=amdfam10 -m64, the loop can be vectorized, and one of the array reference is aligned. As a result and due to PR 45022, we are trying to prefetch only for the aligned reference, and one prefetch is inserted (this time, insns-to-prefetch ratio is big enough). The Fix of PR 45022 will result in NO prefetch generated actually and thus hide the problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45391