Re: Handling prefetcher tag collisions while allocating registers

2017-10-24 Thread Bin.Cheng
On Tue, Oct 24, 2017 at 12:44 AM, Kugan Vivekanandarajah
 wrote:
> Hi All,
>
> I am wondering if there is anyway we can prefer certain registers in
> register allocations. That is, I want to have some way of recording
> register allocation decisions (for loads in loop that are accessed in
> steps) and use this to influence register allocation of other loads
> (again that are accessed in steps).
>
> This is for architectures (like falkor AArch64) that use hardware
> perefetchers that use signatures of the loads to lock into and tune
> prefetching parameters. Ideally, If the loads are from the same
> stream, they should have same signature and if they are from different
> stream, they should have different signature. Destination, base
> register and offset are used in the signature. Therefore, selecting
> different register can influence this.
I wonder why the destination register is used in signature.  In an extreme case,
load in loop can be unrolled then allocated to different dest registers. Forcing
the same dest register could be too restricted.

Thanks,
bin

>
> In LLVM, this is implemented as a machine specific pass that runs
> after register allocation. It then inserts mov instruction with
> scratch registers to manage this. We can do a machine reorg pass in
> gcc but detecting strided loads at that stage is not easy.
>
> I am trying to implement this in gcc and wondering what is the
> preferred and acceptable way to implement this. Any thoughts ?
>
> Thanks,
> Kugan


Re: Handling prefetcher tag collisions while allocating registers

2017-10-24 Thread Richard Biener
On Tue, Oct 24, 2017 at 1:44 AM, Kugan Vivekanandarajah
 wrote:
> Hi All,
>
> I am wondering if there is anyway we can prefer certain registers in
> register allocations. That is, I want to have some way of recording
> register allocation decisions (for loads in loop that are accessed in
> steps) and use this to influence register allocation of other loads
> (again that are accessed in steps).
>
> This is for architectures (like falkor AArch64) that use hardware
> perefetchers that use signatures of the loads to lock into and tune
> prefetching parameters. Ideally, If the loads are from the same
> stream, they should have same signature and if they are from different
> stream, they should have different signature. Destination, base
> register and offset are used in the signature. Therefore, selecting
> different register can influence this.
>
> In LLVM, this is implemented as a machine specific pass that runs
> after register allocation. It then inserts mov instruction with
> scratch registers to manage this. We can do a machine reorg pass in
> gcc but detecting strided loads at that stage is not easy.
>
> I am trying to implement this in gcc and wondering what is the
> preferred and acceptable way to implement this. Any thoughts ?

I see nothing but a machine-dependent reorg pass that can do this.

RA and CSE should already end up using the same base registers
for equal streams if possible.  Forcing the same addressing mode
is more difficult I guess but usually this should happen.

As Bin suggests whatever Falkor does seems to be somewhat stupid.

Richard.

> Thanks,
> Kugan


Re: Handling prefetcher tag collisions while allocating registers

2017-10-24 Thread Kugan Vivekanandarajah
Hi Bin,

On 24 October 2017 at 18:29, Bin.Cheng  wrote:
> On Tue, Oct 24, 2017 at 12:44 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi All,
>>
>> I am wondering if there is anyway we can prefer certain registers in
>> register allocations. That is, I want to have some way of recording
>> register allocation decisions (for loads in loop that are accessed in
>> steps) and use this to influence register allocation of other loads
>> (again that are accessed in steps).
>>
>> This is for architectures (like falkor AArch64) that use hardware
>> perefetchers that use signatures of the loads to lock into and tune
>> prefetching parameters. Ideally, If the loads are from the same
>> stream, they should have same signature and if they are from different
>> stream, they should have different signature. Destination, base
>> register and offset are used in the signature. Therefore, selecting
>> different register can influence this.
> I wonder why the destination register is used in signature.  In an extreme 
> case,
> load in loop can be unrolled then allocated to different dest registers. 
> Forcing
> the same dest register could be too restricted.

My description is very simplified. Signature is based on part of the
register number. Thus, two registers can have same signature. What we
don't want is to have collisions when they are from two different
memory stream. So this is not an issue.

Thanks,
Kugan

>
> Thanks,
> bin
>
>>
>> In LLVM, this is implemented as a machine specific pass that runs
>> after register allocation. It then inserts mov instruction with
>> scratch registers to manage this. We can do a machine reorg pass in
>> gcc but detecting strided loads at that stage is not easy.
>>
>> I am trying to implement this in gcc and wondering what is the
>> preferred and acceptable way to implement this. Any thoughts ?
>>
>> Thanks,
>> Kugan


Re: Handling prefetcher tag collisions while allocating registers

2017-10-24 Thread Vladimir Makarov

On 10/23/2017 07:44 PM, Kugan Vivekanandarajah wrote:

Hi All,

I am wondering if there is anyway we can prefer certain registers in
register allocations. That is, I want to have some way of recording
register allocation decisions (for loads in loop that are accessed in
steps) and use this to influence register allocation of other loads
(again that are accessed in steps).
In some cases, you can achieve assigning the same register for 
particular pseudos.  Please, see threads in ira-color.c.  But the 
pseudos should be colorable at the same time.

This is for architectures (like falkor AArch64) that use hardware
perefetchers that use signatures of the loads to lock into and tune
prefetching parameters. Ideally, If the loads are from the same
stream, they should have same signature and if they are from different
stream, they should have different signature. Destination, base
register and offset are used in the signature. Therefore, selecting
different register can influence this.

In LLVM, this is implemented as a machine specific pass that runs
after register allocation. It then inserts mov instruction with
scratch registers to manage this. We can do a machine reorg pass in
gcc but detecting strided loads at that stage is not easy.
I guess finding the strided loads in GCC RA has the same difficulty as 
in a reorg pass after it.


I think it is better to implement it as a separate pass because it is 
too specific and RA is already complicated.  Implementing it in RA will 
have some constraints (not all cases will be implemented) and you will 
need to overcome many obstacles because RA might change its decisions in 
many places after initial assignments (e.g. in 
ira-color.c::improve_allocation and in many LRA places).



I am trying to implement this in gcc and wondering what is the
preferred and acceptable way to implement this. Any thoughts ?



My preference would be a separate pass but the final decision is up to 
you.  I am just sharing my thoughts.