------- Comment #6 from changpeng dot fang at amd dot com 2010-05-21 21:36 ------- (In reply to comment #5) > The fix introduced: > > FAIL: gcc.dg/tree-ssa/prefetch-7.c scan-assembler-times movnti 18 > FAIL: gcc.dg/tree-ssa/prefetch-7.c scan-tree-dump-times optimized "={nt}" 18 > > on Linux/ia32. >
It seems the unrolling is quite different for different architecture. The count of movnti in and assembly code depends on the unroll_factor. I would propose to remove the movnti check in the assembly code. The dump in aprefetch shows there are two non-temporal stores generated and this is enough. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44185