------- Comment #20 from changpeng dot fang at amd dot com 2010-07-09 01:59 ------- I submitted a patch for review to completely fix the problem. The patch is an extension to Christian's speedup.patch. It splits the cost analysis into three small functions and quits further prefetching analysis as long as we know prefetching is not going to be beneficial to the loop.
Here is the gcc-patches@ link: http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00734.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576