adding dependence from prefetch to load

2007-04-11 Thread George Caragea

Hi,
I have a mips-like architecture which has prefetch instructions. I'm 
writing an optimization pass that inserts prefetch instructions for all 
array reads. The catch is that I'm trying to do this even if the reads 
are not in a loop.

I have two questions:

1. Is there any work out there that has tried to do this before? All I 
found in the latest gcc-svn was tree-ssa-loop-prefetch.c, but since my 
references are not in a loop, a lot of the things done in there will not 
apply to me.


2. Right now I am inserting a __builting_prefetch(...) call immediately 
before the actual read, getting something like:

 D.1117_12 = &A[D.1101_14];
 __builtin_prefetch (D.1117_12, 0, 1);
 D.1102_16 = A[D.1101_14];

However, if I enable the instruction scheduler pass, it doesn't realize 
there's a dependency between the prefetch and the load, and it actually 
moves the prefetch after the load, rendering it useless. How can I 
instruct the scheduler of this dependence?


My thinking is to also specify a latency for prefetch, so that the 
scheduler will hopefully place the prefetch somewhere earlier in the 
code to partially hide this latency. Do you see anything wrong with this 
approach?


The prefetch instruction in the .md file is defined as:
(define_insn "prefetch"
 [(prefetch (match_operand:QI 0 "address_operand" "p")
(match_operand 1 "const_int_operand" "n")
(match_operand 2 "const_int_operand" "n"))]
 ""
{
 operands[1] = mips_prefetch_cookie (operands[1], operands[2]);
 return "pref\t%1,%a0";
}
 [(set_attr "type" "prefetch")])

Thanks,
George



Re: adding dependence from prefetch to load

2007-04-12 Thread George Caragea

Zdenek Dvorak wrote:
2. Right now I am inserting a __builting_prefetch(...) call immediately 
before the actual read, getting something like:

 D.1117_12 = &A[D.1101_14];
 __builtin_prefetch (D.1117_12, 0, 1);
 D.1102_16 = A[D.1101_14];

However, if I enable the instruction scheduler pass, it doesn't realize 
there's a dependency between the prefetch and the load, and it actually 
moves the prefetch after the load, rendering it useless. How can I 
instruct the scheduler of this dependence?


My thinking is to also specify a latency for prefetch, so that the 
scheduler will hopefully place the prefetch somewhere earlier in the 
code to partially hide this latency. Do you see anything wrong with this 
approach?



well, it assumes that the scheduler works with long enough lookahead to
actually be able to move the prefetch far enough; i.e., if the
architecture you work with is relatively slow in comparison with the
memory access times, this might be feasible approach.  However, on
modern machines, miss in L2 cache may take hundreds of cycles, and it is
not clear to me that scheduler will be able to move the prefetch so far,
or indeed, that it would even be possible (I think often you do not
know the address far enough in advance).  
  


Well, the target architecture is actually quite peculiar, it's a 
parallel SPMD machine. The only similarity with MIPS is the ISA. The 
latency I'm trying to hide is somewhere around 24 cycles, but because it 
is a parallel machine, up to 1024 threads have to stall for 24 cycles in 
the absence of prefetching, which affects overall performance.
My initial studies show that this latency can be hidden with a properly 
inserted prefetch instruction, and I think that the scheduler can help 
with that, if properly guided.


So my initial question remains: is there any way to tell the scheduler 
not to place the prefetch instruction after the actual read?


The prefetch instruction takes an address_operand, and it seems all I 
need to do is tell the scheduler prefetch will "write" to that address, 
so it will see a true dependence between the prefetch and the read. But 
I don't know how to do that, and changing the md file to say  "+p" or 
"+d" for the first operand of the prefetch didn't help.


Thanks,
George