------- Comment #2 from pinskia at gcc dot gnu dot org 2010-07-24 20:32 ------- (In reply to comment #1) > The direct reason is that prefetching could not differentiate the base > addresses > of the vectorized load and store (of a[i]): > *vect_pa.6_24 > *vect_pa.19_37
Here is a testcase which shows the same issue without the vectorizer (compile -O2 -fprefetch-loop-arrays -march=amdfam10 -fno-tree-ccp -fno-tree-vrp -fno-tree-dominator-opts): float *f(); float aa[1024]; float bb[1024]; void foo(int beta) { int i; float *a = aa, *a1 = aa, *b = bb; for(i=0; i<1024; i++) { *a = *a1 + beta * *b; a++; a1++; b++; } } -- pinskia at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Keywords| |missed-optimization Last reconfirmed|0000-00-00 00:00:00 |2010-07-24 20:32:24 date| | Summary|Redundant prefetches for the|Redundant prefetches for |vectorized loop |some loops (vectorizer | |produced ones too) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45021