Zdenek Dvorak wrote:
2. Right now I am inserting a __builting_prefetch(...) call immediately
before the actual read, getting something like:
D.1117_12 = &A[D.1101_14];
__builtin_prefetch (D.1117_12, 0, 1);
D.1102_16 = A[D.1101_14];
However, if I enable the instruction scheduler pass, it doesn't realize
there's a dependency between the prefetch and the load, and it actually
moves the prefetch after the load, rendering it useless. How can I
instruct the scheduler of this dependence?
My thinking is to also specify a latency for prefetch, so that the
scheduler will hopefully place the prefetch somewhere earlier in the
code to partially hide this latency. Do you see anything wrong with this
approach?
well, it assumes that the scheduler works with long enough lookahead to
actually be able to move the prefetch far enough; i.e., if the
architecture you work with is relatively slow in comparison with the
memory access times, this might be feasible approach. However, on
modern machines, miss in L2 cache may take hundreds of cycles, and it is
not clear to me that scheduler will be able to move the prefetch so far,
or indeed, that it would even be possible (I think often you do not
know the address far enough in advance).
Well, the target architecture is actually quite peculiar, it's a
parallel SPMD machine. The only similarity with MIPS is the ISA. The
latency I'm trying to hide is somewhere around 24 cycles, but because it
is a parallel machine, up to 1024 threads have to stall for 24 cycles in
the absence of prefetching, which affects overall performance.
My initial studies show that this latency can be hidden with a properly
inserted prefetch instruction, and I think that the scheduler can help
with that, if properly guided.
So my initial question remains: is there any way to tell the scheduler
not to place the prefetch instruction after the actual read?
The prefetch instruction takes an address_operand, and it seems all I
need to do is tell the scheduler prefetch will "write" to that address,
so it will see a true dependence between the prefetch and the read. But
I don't know how to do that, and changing the md file to say "+p" or
"+d" for the first operand of the prefetch didn't help.
Thanks,
George