------- Comment #9 from rakdver at atrey dot karlin dot mff dot cuni dot cz  
2006-09-18 08:44 -------
Subject: Re:  IV selection is messed up

> On 17 Sep 2006 22:48:12 -0000, rakdver at gcc dot gnu dot org
> <[EMAIL PROTECTED]> wrote:
> > Regarding the "-fprefetch-loop-arrays's heuristic is way off the mark" part,
> > gcc badly overestimates the size of the loop (it guesses 300 insns).  I will
> > check what I can do with that.
> Provided i understand what you meant, it's the other way around; with
> -fprefetch-loop-array gcc prefetch distance is much too short.

Which is caused by the overestimation of the loop size.  The heuristics
to determine the prefetch distance is constant/size of the loop (which
is the best approximation of the well-known formula
memory latency/time to execute the loop body that we can achieve at the
moment).  If the estimate happened to be more precise (say something
like 40 insns for the testcase below),  gcc would prefetch 5 iterations
ahead with the default values of the constants, which would be slightly
better (although still not quite enough).

> If i remember correctly, that testcase takes a bunch of cycles per
> iteration on my k8 (opteron 252) and you have to prefetch at the very
> least 256 bytes away to make that profitable; it's less than 128 with
> gcc-4.2-20060908.
> 
> That testcase is pretty silly anyway.
> Here's what i get with the real code and -fprefetch-loop-array
> 
>   4011c2:       movdqa (%ecx),%xmm2
>   4011c6:       lea    0x10(%ecx),%eax
>   4011c9:       movdqa %xmm6,%xmm4
>   4011cd:       dec    %edx
>   4011ce:       movdqa %xmm2,%xmm0
>   4011d2:       mov    %eax,%ecx
>   4011d4:       prefetcht0 (%eax)
>   4011d7:       movdqa %xmm6,%xmm1
>   4011db:       punpckldq %xmm2,%xmm0
>   4011df:       punpckhdq %xmm2,%xmm2
>   4011e3:       movdqa %xmm0,%xmm3
>   4011e7:       punpcklqdq %xmm0,%xmm3
>   4011eb:       punpckhqdq %xmm0,%xmm0
>   4011ef:       pcmpgtd %xmm3,%xmm4
>   4011f3:       pcmpgtd %xmm0,%xmm1
>   4011f7:       paddd  0x10(%esp),%xmm4
>   4011fd:       paddd  %xmm1,%xmm4
>   401201:       movdqa %xmm5,%xmm1
>   401205:       pcmpgtd %xmm3,%xmm1
>   401209:       movdqa %xmm1,%xmm3
>   40120d:       movdqa %xmm5,%xmm1
>   401211:       paddd  %xmm7,%xmm3
>   401215:       pcmpgtd %xmm0,%xmm1
>   401219:       movdqa %xmm6,%xmm0
>   40121d:       paddd  %xmm1,%xmm3
>   401221:       movdqa %xmm2,%xmm1
>   401225:       punpcklqdq %xmm2,%xmm1
>   401229:       punpckhqdq %xmm2,%xmm2
>   40122d:       pcmpgtd %xmm1,%xmm0
>   401231:       paddd  %xmm0,%xmm4
>   401235:       movdqa %xmm6,%xmm0
>   401239:       pcmpgtd %xmm2,%xmm0
>   40123d:       paddd  %xmm0,%xmm4
>   401241:       movdqa %xmm5,%xmm0
>   401245:       movaps %xmm4,0x10(%esp)
>   40124a:       pcmpgtd %xmm1,%xmm0
>   40124e:       paddd  %xmm0,%xmm3
>   401252:       movdqa %xmm5,%xmm0
>   401256:       pcmpgtd %xmm2,%xmm0
>   40125a:       paddd  %xmm0,%xmm3
>   40125e:       movdqa %xmm3,%xmm7
>   401262:       jne    4011c2
> <kdlib::AEBH::streaming_sampling(kdlib::AEBH::streaming_node_t const&,
> kdlib::AEBH::sampler3D_t const&)+0x52>
> 
> Each iteration takes about 8 cycles when not starved and prefetching
> isn't a win unless done at least 4 or 8 cachelines away, so this one
> is nothing but a hinderance.
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919

Reply via email to