On Mon, Jan 15, 2024 at 06:25:13PM +0530, Ajit Agarwal wrote: > Also Mike and Kewwn suggested to use this pass \before IRA register > allocator. They are in To List. They have other concerns doing after > register allocator. > > They have responded in other mail Chain.
The problem with doing it after register allocation is it limits the hit rate to the situation where the register allocation happened to guess right, and allocated adjacent registers. Note, the PowerPC has some twists: 1) load/store vector pair must use an even/odd VSX register pair. 2) Some instructions only operate on traditional FPR registers (VSX registers 0..31) and others only operate on traditional Altivec registers (VSX reigsters 32..63). I.e. if you are doing a load vector pair, and you are going to do say a V2DI vector add, you need to load the vector pair into Altivec registers to avoid having to do a copy operation. In general, I tend to feel stuffing things into a larger register and then using SUBREG is going to be often times generate other moves. On the PowerPC right now, we can't even use SUBREG of OOmode (the 256-bit opaque type), but Peter has patches to deal with some of the issues. But at the moment, we don't have support for expressing this load such that register allocation can handle it. Rather than using a large register mode, I tend to feel that we should enhace match_parallel so that register allocation can allocate the registers sequentially. Now, I haven't looked at match_parallel for 15-20 years, but my sense was it only worked for fixed registers generated elsewhere (such as for the load/store string instruction support). I.e. rather than doing something like: (set (reg:OO <oo_reg1>) (mem:OO <oo_mem1>)) (set (reg:V2DF <v2df_reg1>) (subreg:V2DF (reg:OO <oo_reg1>) 0)) (set (reg:V2DF <v2df_reg2>) (subreg:V2DF (reg:OO <oo_reg1>) 16)) ; do stuff involving v2df_reg1 and v2df_reg2 (clobber (reg:OO <oo_reg2>) (set (subreg:V2DF (reg:OO <oo_reg2>) 0) (reg:V2DF <v2df_reg1>)) (set (subreg:V2DF (reg:OO <oo_reg2>) 16) (reg:V2DF <v2df_reg2>)) (set (mem:OO <oo_mem2>) (reg:OO <oo_reg2>)) We would do: (parallel [(set (reg:V2DF <v2df_reg1>) (mem:V2DF <v2df_mem1>)) (set (reg:V2DF <v2df_reg2>) (mem:V2DF <v2df_mem2>)))]) ; do stuff involving v2df_reg1 and v2df_reg2 (parallel [(set (mem:V2DF <v2df_mem3>) (reg:V2DF <v2df_reg1>)) (set (mem:V2DF <v2df_mem4>) (reg:V2DF <v2df_reg2>))]) Now in those two parallels above, we would need to use match_parallel to ensure that the registers are allocated sequentially (and in the PowerPC, start on an even VSX register), and the addresses are bumped up by 16 bytes. Ideally, the combiner should try to combine things, but it may be simpler to use a separate MD pass. It would be nice if we had a standard constraint mechanism like %<n> that says use %<n> but add 1/2/3/etc. to the register number if it is a REG, or a size*number added to a memory address if it is a MEM. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com