On Fri, 2020-09-11 at 12:14 +0100, Richard Sandiford wrote: > Ilya Leoshkevich <i...@linux.ibm.com> writes: > > On Fri, 2020-09-11 at 12:17 +0200, Ilya Leoshkevich wrote: > > > On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote: > > > > Ilya Leoshkevich via Gcc <gcc@gcc.gnu.org> writes: > > > > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote: > > > > > > Hi Ilya, > > > > > > > > > > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich > > > > > > via > > > > > > Gcc > > > > > > wrote: > > > > > > > I have a vector pseudo containing a single 128-bit value > > > > > > > (V1TFmode) > > > > > > > and > > > > > > > I need to access its last 64 bits (DFmode). Which of the > > > > > > > two > > > > > > > options > > > > > > > is better? > > > > > > > > > > > > > > (subreg:DF (reg:V1TF) 8) > > > > > > > > > > > > > > or > > > > > > > > > > > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel > > > > > > > [(const_int > > > > > > > 1)])) > > > > > > > > > > > > > > If I use the first one, I run into a problem with > > > > > > > set_noop_p > > > > > > > (): it > > > > > > > thinks that > > > > > > > > > > > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) > > > > > > > 8)) > > > > > > > > > > > > > > is a no-op, because it doesn't check the mode after > > > > > > > stripping > > > > > > > the > > > > > > > subreg: > > > > > > > > > > > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616 > > > > > > > > > > > > > > However this is not correct, because SET_DEST is the > > > > > > > second > > > > > > > register in > > > > > > > a register pair, and SET_SRC is half of a vector register > > > > > > > that > > > > > > > overlaps > > > > > > > the first register in the corresponding pair. So it looks > > > > > > > as > > > > > > > if > > > > > > > mode > > > > > > > needs to be considered there. > > > > > > > > > > > > Yes. > > > > > > > > > > > > > This helps: > > > > > > > > > > > > > > --- a/gcc/rtlanal.c > > > > > > > +++ b/gcc/rtlanal.c > > > > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set) > > > > > > > return 0; > > > > > > > src = SUBREG_REG (src); > > > > > > > dst = SUBREG_REG (dst); > > > > > > > + if (GET_MODE (src) != GET_MODE (dst)) > > > > > > > + return 0; > > > > > > > } > > > > > > > > > > > > > > but I'm not sure whether I'm not missing something about > > > > > > > subreg > > > > > > > semantics in the first place. > > > > > > > > > > > > You probably should just see if both modes are the same > > > > > > number > > > > > > of > > > > > > hard > > > > > > registers? HARD_REGNO_NREGS. > > > > > > > > > > I've refined my patch as follows: > > > > > > > > > > --- a/gcc/rtlanal.c > > > > > +++ b/gcc/rtlanal.c > > > > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set) > > > > > return 0; > > > > > src = SUBREG_REG (src); > > > > > dst = SUBREG_REG (dst); > > > > > + if (REG_P (src) && HARD_REGISTER_P (src) && REG_P > > > > > (dst) > > > > > + && HARD_REGISTER_P (dst) > > > > > + && hard_regno_nregs (REGNO (src), GET_MODE (src)) > > > > > + != hard_regno_nregs (REGNO (dst), GET_MODE > > > > > (dst))) > > > > > + return 0; > > > > > } > > > > > > > > I think checking the mode would be safer. Having the same > > > > number > > > > of registers doesn't mean that the bits are distributed across > > > > the > > > > registers in the same way. > > > > > > Yeah, that's what I was trying to express with this hypothetical > > > machine example. On the other hand, checking mode is too > > > pessimistic. > > > E.g. if we talk about s390 GPRs, then considering > > > > > > (set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4)) > > > > Sorry, bad example: here the hard register modes actually match. > > But it's probably possible to come up with something similar, where > > the hard reg is accessed with different modes, but when we add > > subregs > > on top, then we end up selecting the same bits. > > Wel, in general, (subreg (reg …) …) shouldn't exist for hard > registers. > They should just be folded to the underlying (reg …) instead. > So in the above I'd expect to see: > > (set (reg:SI %r0) (reg:SI %r0)) > > instead. I remember there were cases on e.g. e500 where the target > prevented the fold from taking place due to layout issues, but wasn't > sure whether the above FP example above was something similar. > > So… > > > > a no-op is fine from my perspective. So having a more > > > restrictive > > > check might be desirable. Is there a way to ask the backend how > > > the > > > subreg bits are distributed? > > …if the bits are distributed evenly in an obvious way, the hard > register > subreg should already have been folded to a reg. I think subregs of > hard registers are rare enough that we should just keep it simple > and safe.
Ok! I'll make a proper patch out of this and post to gcc-patches then. > > > > > Out of interest, why can't the subregs in the example above get > > > > folded down to hard registers? > > > > > > I think this is because the offsets are not 0. I could imagine > > > folding > > > (subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a > > > backend hook for this. Does anything like this exist? Also, can > > > (subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply > > > the second doubleword of 128-bit %v0 vector register. > > For multi-register values with a regular layout, > simplify_subreg_regno > (via subreg_get_info) uses various target queries to work out which > hard registers are actually being accessed. Is the subreg above > for big-endian or little-endian? (I hope big-endian given the > 8 offset.) This is big-endian (s390). Thanks for the info, register-related target queries seem to be pretty extensive. I still couldn't see anything that would answer e.g. "what bits are accessed by (reg:TF %f0)" with "bits 0-64 of %f0 and bits 0-64 of %f2", but that level of detail is probably not needed anyway.