Hello,

> 2. Right now I am inserting a __builting_prefetch(...) call immediately 
> before the actual read, getting something like:
>  D.1117_12 = &A[D.1101_14];
>  __builtin_prefetch (D.1117_12, 0, 1);
>  D.1102_16 = A[D.1101_14];
> 
> However, if I enable the instruction scheduler pass, it doesn't realize 
> there's a dependency between the prefetch and the load, and it actually 
> moves the prefetch after the load, rendering it useless. How can I 
> instruct the scheduler of this dependence?
> 
> My thinking is to also specify a latency for prefetch, so that the 
> scheduler will hopefully place the prefetch somewhere earlier in the 
> code to partially hide this latency. Do you see anything wrong with this 
> approach?

well, it assumes that the scheduler works with long enough lookahead to
actually be able to move the prefetch far enough; i.e., if the
architecture you work with is relatively slow in comparison with the
memory access times, this might be feasible approach.  However, on
modern machines, miss in L2 cache may take hundreds of cycles, and it is
not clear to me that scheduler will be able to move the prefetch so far,
or indeed, that it would even be possible (I think often you do not
know the address far enough in advance).  Also, prefetching outside of
loops in general appears not to be all that profitable, since usually most of 
the
time is spent within loops.

So I would recommend first doing some analysis and measurements (say by
introducing the prefetches by hand) to check whether this project really
has potential to lead to significant speedups.

Zdenek

Reply via email to