Hello,
> Well, the target architecture is actually quite peculiar, it's a
> parallel SPMD machine. The only similarity with MIPS is the ISA. The
> latency I'm trying to hide is somewhere around 24 cycles, but because it
> is a parallel machine, up to 1024 threads have to stall for 24 cycles in
George Caragea wrote:
So my initial question remains: is there any way to tell the scheduler
not to place the prefetch instruction after the actual read?
You can try changing sched_analyze_2 in sched-deps.c to handle PREFETCH
specially.
You could perhaps handle it similarly to how PRE_DEC is
Zdenek Dvorak wrote:
2. Right now I am inserting a __builting_prefetch(...) call immediately
before the actual read, getting something like:
D.1117_12 = &A[D.1101_14];
__builtin_prefetch (D.1117_12, 0, 1);
D.1102_16 = A[D.1101_14];
However, if I enable the instruction scheduler pass, it does
Hello,
> 2. Right now I am inserting a __builting_prefetch(...) call immediately
> before the actual read, getting something like:
> D.1117_12 = &A[D.1101_14];
> __builtin_prefetch (D.1117_12, 0, 1);
> D.1102_16 = A[D.1101_14];
>
> However, if I enable the instruction scheduler pass, it doesn
Hi,
I have a mips-like architecture which has prefetch instructions. I'm
writing an optimization pass that inserts prefetch instructions for all
array reads. The catch is that I'm trying to do this even if the reads
are not in a loop.
I have two questions:
1. Is there any work out there that