Re: [patch] Improve loop array prefetch for IA-64

Andi Kleen Fri, 02 Jun 2006 20:54:36 -0700

"Steven Bosscher" <[EMAIL PROTECTED]> writes:

> On 6/2/06, Davis, Mark <[EMAIL PROTECTED]> wrote:
> > Question: does gcc now know the difference between prefetching to cache L1 
> > via
> > "lfetch", as opposed to prefetching only to level L2 via "lfetch.nt1"?
> 
> The ia64 backend knows the difference, see the prefetch pattern in ia64.md.
> 
> But ia64 is the only backend that supports this kind of explicit
> locality parameter. And since no-one from the ia64 community cared
> much about gcc until recently, gcc's prefetching pass (which is
> limited anyway) does not generate lfetch.nt1 or other prefetches with
> explicit locality parameters.


Actually SSE X86 has prefetches with different locality hints (T0, T1, T2, NTA)

However x86 always needs to have the items in L1 cache to do anything
with them even for FP data so it might not be very useful to do this
particular optimization for it.

T0 vs NTA is useful though and at least AMD K8 can make use of them - when
data is streamed and not reused and there is a lot of it then NTA is a good 
idea.

> > For floating point data, the latter is the only interesting case because 
> > float loads only
> > access the L2.  Thus using "lfetch" for floating point arrays will 
> > unnecessarily wipe out > the contents of L1.  (gcc 3.2.3 only seems to 
> > generate "lfetch", which is why I ask...)
> 
> You could experiment with this for ia64 by hacking issue_prefetch_ref
> in tree-ssa-loop-prefetch.c to issue a prefetch to L2 for floating
> point types.

Perhaps it could generate different prefetches based on the array size being
worked on?

I guess e.g. for an 1MB array walk NTA is probably a good idea (with the 1MB 
being
a tunable) 

-Andi

Re: [patch] Improve loop array prefetch for IA-64

Reply via email to