Re: Handling prefetcher tag collisions while allocating registers
On Tue, Oct 24, 2017 at 12:44 AM, Kugan Vivekanandarajah wrote: > Hi All, > > I am wondering if there is anyway we can prefer certain registers in > register allocations. That is, I want to have some way of recording > register allocation decisions (for loads in loop that are accessed in > steps) and use this to influence register allocation of other loads > (again that are accessed in steps). > > This is for architectures (like falkor AArch64) that use hardware > perefetchers that use signatures of the loads to lock into and tune > prefetching parameters. Ideally, If the loads are from the same > stream, they should have same signature and if they are from different > stream, they should have different signature. Destination, base > register and offset are used in the signature. Therefore, selecting > different register can influence this. I wonder why the destination register is used in signature. In an extreme case, load in loop can be unrolled then allocated to different dest registers. Forcing the same dest register could be too restricted. Thanks, bin > > In LLVM, this is implemented as a machine specific pass that runs > after register allocation. It then inserts mov instruction with > scratch registers to manage this. We can do a machine reorg pass in > gcc but detecting strided loads at that stage is not easy. > > I am trying to implement this in gcc and wondering what is the > preferred and acceptable way to implement this. Any thoughts ? > > Thanks, > Kugan
Re: Handling prefetcher tag collisions while allocating registers
On Tue, Oct 24, 2017 at 1:44 AM, Kugan Vivekanandarajah wrote: > Hi All, > > I am wondering if there is anyway we can prefer certain registers in > register allocations. That is, I want to have some way of recording > register allocation decisions (for loads in loop that are accessed in > steps) and use this to influence register allocation of other loads > (again that are accessed in steps). > > This is for architectures (like falkor AArch64) that use hardware > perefetchers that use signatures of the loads to lock into and tune > prefetching parameters. Ideally, If the loads are from the same > stream, they should have same signature and if they are from different > stream, they should have different signature. Destination, base > register and offset are used in the signature. Therefore, selecting > different register can influence this. > > In LLVM, this is implemented as a machine specific pass that runs > after register allocation. It then inserts mov instruction with > scratch registers to manage this. We can do a machine reorg pass in > gcc but detecting strided loads at that stage is not easy. > > I am trying to implement this in gcc and wondering what is the > preferred and acceptable way to implement this. Any thoughts ? I see nothing but a machine-dependent reorg pass that can do this. RA and CSE should already end up using the same base registers for equal streams if possible. Forcing the same addressing mode is more difficult I guess but usually this should happen. As Bin suggests whatever Falkor does seems to be somewhat stupid. Richard. > Thanks, > Kugan
Re: Handling prefetcher tag collisions while allocating registers
Hi Bin, On 24 October 2017 at 18:29, Bin.Cheng wrote: > On Tue, Oct 24, 2017 at 12:44 AM, Kugan Vivekanandarajah > wrote: >> Hi All, >> >> I am wondering if there is anyway we can prefer certain registers in >> register allocations. That is, I want to have some way of recording >> register allocation decisions (for loads in loop that are accessed in >> steps) and use this to influence register allocation of other loads >> (again that are accessed in steps). >> >> This is for architectures (like falkor AArch64) that use hardware >> perefetchers that use signatures of the loads to lock into and tune >> prefetching parameters. Ideally, If the loads are from the same >> stream, they should have same signature and if they are from different >> stream, they should have different signature. Destination, base >> register and offset are used in the signature. Therefore, selecting >> different register can influence this. > I wonder why the destination register is used in signature. In an extreme > case, > load in loop can be unrolled then allocated to different dest registers. > Forcing > the same dest register could be too restricted. My description is very simplified. Signature is based on part of the register number. Thus, two registers can have same signature. What we don't want is to have collisions when they are from two different memory stream. So this is not an issue. Thanks, Kugan > > Thanks, > bin > >> >> In LLVM, this is implemented as a machine specific pass that runs >> after register allocation. It then inserts mov instruction with >> scratch registers to manage this. We can do a machine reorg pass in >> gcc but detecting strided loads at that stage is not easy. >> >> I am trying to implement this in gcc and wondering what is the >> preferred and acceptable way to implement this. Any thoughts ? >> >> Thanks, >> Kugan
Re: Handling prefetcher tag collisions while allocating registers
On 10/23/2017 07:44 PM, Kugan Vivekanandarajah wrote: Hi All, I am wondering if there is anyway we can prefer certain registers in register allocations. That is, I want to have some way of recording register allocation decisions (for loads in loop that are accessed in steps) and use this to influence register allocation of other loads (again that are accessed in steps). In some cases, you can achieve assigning the same register for particular pseudos. Please, see threads in ira-color.c. But the pseudos should be colorable at the same time. This is for architectures (like falkor AArch64) that use hardware perefetchers that use signatures of the loads to lock into and tune prefetching parameters. Ideally, If the loads are from the same stream, they should have same signature and if they are from different stream, they should have different signature. Destination, base register and offset are used in the signature. Therefore, selecting different register can influence this. In LLVM, this is implemented as a machine specific pass that runs after register allocation. It then inserts mov instruction with scratch registers to manage this. We can do a machine reorg pass in gcc but detecting strided loads at that stage is not easy. I guess finding the strided loads in GCC RA has the same difficulty as in a reorg pass after it. I think it is better to implement it as a separate pass because it is too specific and RA is already complicated. Implementing it in RA will have some constraints (not all cases will be implemented) and you will need to overcome many obstacles because RA might change its decisions in many places after initial assignments (e.g. in ira-color.c::improve_allocation and in many LRA places). I am trying to implement this in gcc and wondering what is the preferred and acceptable way to implement this. Any thoughts ? My preference would be a separate pass but the final decision is up to you. I am just sharing my thoughts.