------- Comment #9 from rakdver at atrey dot karlin dot mff dot cuni dot cz 2006-09-18 08:44 ------- Subject: Re: IV selection is messed up
> On 17 Sep 2006 22:48:12 -0000, rakdver at gcc dot gnu dot org > <[EMAIL PROTECTED]> wrote: > > Regarding the "-fprefetch-loop-arrays's heuristic is way off the mark" part, > > gcc badly overestimates the size of the loop (it guesses 300 insns). I will > > check what I can do with that. > Provided i understand what you meant, it's the other way around; with > -fprefetch-loop-array gcc prefetch distance is much too short. Which is caused by the overestimation of the loop size. The heuristics to determine the prefetch distance is constant/size of the loop (which is the best approximation of the well-known formula memory latency/time to execute the loop body that we can achieve at the moment). If the estimate happened to be more precise (say something like 40 insns for the testcase below), gcc would prefetch 5 iterations ahead with the default values of the constants, which would be slightly better (although still not quite enough). > If i remember correctly, that testcase takes a bunch of cycles per > iteration on my k8 (opteron 252) and you have to prefetch at the very > least 256 bytes away to make that profitable; it's less than 128 with > gcc-4.2-20060908. > > That testcase is pretty silly anyway. > Here's what i get with the real code and -fprefetch-loop-array > > 4011c2: movdqa (%ecx),%xmm2 > 4011c6: lea 0x10(%ecx),%eax > 4011c9: movdqa %xmm6,%xmm4 > 4011cd: dec %edx > 4011ce: movdqa %xmm2,%xmm0 > 4011d2: mov %eax,%ecx > 4011d4: prefetcht0 (%eax) > 4011d7: movdqa %xmm6,%xmm1 > 4011db: punpckldq %xmm2,%xmm0 > 4011df: punpckhdq %xmm2,%xmm2 > 4011e3: movdqa %xmm0,%xmm3 > 4011e7: punpcklqdq %xmm0,%xmm3 > 4011eb: punpckhqdq %xmm0,%xmm0 > 4011ef: pcmpgtd %xmm3,%xmm4 > 4011f3: pcmpgtd %xmm0,%xmm1 > 4011f7: paddd 0x10(%esp),%xmm4 > 4011fd: paddd %xmm1,%xmm4 > 401201: movdqa %xmm5,%xmm1 > 401205: pcmpgtd %xmm3,%xmm1 > 401209: movdqa %xmm1,%xmm3 > 40120d: movdqa %xmm5,%xmm1 > 401211: paddd %xmm7,%xmm3 > 401215: pcmpgtd %xmm0,%xmm1 > 401219: movdqa %xmm6,%xmm0 > 40121d: paddd %xmm1,%xmm3 > 401221: movdqa %xmm2,%xmm1 > 401225: punpcklqdq %xmm2,%xmm1 > 401229: punpckhqdq %xmm2,%xmm2 > 40122d: pcmpgtd %xmm1,%xmm0 > 401231: paddd %xmm0,%xmm4 > 401235: movdqa %xmm6,%xmm0 > 401239: pcmpgtd %xmm2,%xmm0 > 40123d: paddd %xmm0,%xmm4 > 401241: movdqa %xmm5,%xmm0 > 401245: movaps %xmm4,0x10(%esp) > 40124a: pcmpgtd %xmm1,%xmm0 > 40124e: paddd %xmm0,%xmm3 > 401252: movdqa %xmm5,%xmm0 > 401256: pcmpgtd %xmm2,%xmm0 > 40125a: paddd %xmm0,%xmm3 > 40125e: movdqa %xmm3,%xmm7 > 401262: jne 4011c2 > <kdlib::AEBH::streaming_sampling(kdlib::AEBH::streaming_node_t const&, > kdlib::AEBH::sampler3D_t const&)+0x52> > > Each iteration takes about 8 cycles when not starved and prefetching > isn't a win unless done at least 4 or 8 cachelines away, so this one > is nothing but a hinderance. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919