> On Dec 13, 2014, at 5:22 AM, Ajit Kumar Agarwal
> <[email protected]> wrote:
>
> Hello All:
>
> Since the prefetch instruction have no direct consumers in the code stream,
> they provide considerable freedom to the
> Instruction scheduler. They are typically assigned lower priorities than most
> of the instructions in the code stream.
> This tends to cause all the prefetch instructions to be placed together in
> the final schedule. This causes the performance
> Degradations by placing them in clumps rather than evenly spreading the
> prefetch instructions.
>
> The evenly spreading the prefetch instruction gives better speed up ratios as
> compared to be placing in clumps for dirty
> Misses.
I can believe that’s true for some processors; is it true for all of them? I
have the impression that some MIPS processors don’t mind clumped prefetches, so
long as you don’t exceed the limit on total number of concurrently pending
memory accesses.
paul