Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

Michael Meissner Thu, 18 Jan 2024 20:19:22 -0800

On Mon, Jan 15, 2024 at 06:25:13PM +0530, Ajit Agarwal wrote:
> Also Mike and Kewwn suggested to use this pass \before IRA register
> allocator. They are in To List. They have other concerns doing after 
> register allocator.
> 
> They have responded in other mail Chain.


The problem with doing it after register allocation is it limits the hit rate
to the situation where the register allocation happened to guess right, and
allocated adjacent registers.

Note, the PowerPC has some twists:

1) load/store vector pair must use an even/odd VSX register pair.

2) Some instructions only operate on traditional FPR registers (VSX registers
0..31) and others only operate on traditional Altivec registers (VSX reigsters
32..63).  I.e. if you are doing a load vector pair, and you are going to do say
a V2DI vector add, you need to load the vector pair into Altivec registers to
avoid having to do a copy operation.

In general, I tend to feel stuffing things into a larger register and then
using SUBREG is going to be often times generate other moves.  On the PowerPC
right now, we can't even use SUBREG of OOmode (the 256-bit opaque type), but
Peter has patches to deal with some of the issues.

But at the moment, we don't have support for expressing this load such that
register allocation can handle it.

Rather than using a large register mode, I tend to feel that we should enhace
match_parallel so that register allocation can allocate the registers
sequentially.  Now, I haven't looked at match_parallel for 15-20 years, but my
sense was it only worked for fixed registers generated elsewhere (such as for
the load/store string instruction support).

I.e. rather than doing something like:

        (set (reg:OO <oo_reg1>)
             (mem:OO <oo_mem1>))

        (set (reg:V2DF <v2df_reg1>)
             (subreg:V2DF (reg:OO <oo_reg1>) 0))

        (set (reg:V2DF <v2df_reg2>)
             (subreg:V2DF (reg:OO <oo_reg1>) 16))

        ; do stuff involving v2df_reg1 and v2df_reg2

        (clobber (reg:OO <oo_reg2>)

        (set (subreg:V2DF (reg:OO <oo_reg2>) 0)
             (reg:V2DF <v2df_reg1>))

        (set (subreg:V2DF (reg:OO <oo_reg2>) 16)
             (reg:V2DF <v2df_reg2>))

        (set (mem:OO <oo_mem2>)
             (reg:OO <oo_reg2>))

We would do:

        (parallel [(set (reg:V2DF <v2df_reg1>)
                        (mem:V2DF <v2df_mem1>))
                   (set (reg:V2DF <v2df_reg2>)
                        (mem:V2DF <v2df_mem2>)))])

        ; do stuff involving v2df_reg1 and v2df_reg2

        (parallel [(set (mem:V2DF <v2df_mem3>)
                        (reg:V2DF <v2df_reg1>))
                   (set (mem:V2DF <v2df_mem4>)
                        (reg:V2DF <v2df_reg2>))])

Now in those two parallels above, we would need to use match_parallel to ensure
that the registers are allocated sequentially (and in the PowerPC, start on an
even VSX register), and the addresses are bumped up by 16 bytes.

Ideally, the combiner should try to combine things, but it may be simpler to
use a separate MD pass.

It would be nice if we had a standard constraint mechanism like %<n> that says
use %<n> but add 1/2/3/etc. to the register number if it is a REG, or a
size*number added to a memory address if it is a MEM.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

Reply via email to