On Fri, 2020-09-11 at 12:14 +0100, Richard Sandiford wrote:
> Ilya Leoshkevich <i...@linux.ibm.com> writes:
> > On Fri, 2020-09-11 at 12:17 +0200, Ilya Leoshkevich wrote:
> > > On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote:
> > > > Ilya Leoshkevich via Gcc <gcc@gcc.gnu.org> writes:
> > > > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> > > > > > Hi Ilya,
> > > > > > 
> > > > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich
> > > > > > via
> > > > > > Gcc
> > > > > > wrote:
> > > > > > > I have a vector pseudo containing a single 128-bit value
> > > > > > > (V1TFmode)
> > > > > > > and
> > > > > > > I need to access its last 64 bits (DFmode). Which of the
> > > > > > > two
> > > > > > > options
> > > > > > > is better?
> > > > > > > 
> > > > > > > (subreg:DF (reg:V1TF) 8)
> > > > > > > 
> > > > > > > or
> > > > > > > 
> > > > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel
> > > > > > > [(const_int
> > > > > > > 1)]))
> > > > > > > 
> > > > > > > If I use the first one, I run into a problem with
> > > > > > > set_noop_p
> > > > > > > (): it
> > > > > > > thinks that
> > > > > > > 
> > > > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0)
> > > > > > > 8))
> > > > > > > 
> > > > > > > is a no-op, because it doesn't check the mode after
> > > > > > > stripping
> > > > > > > the
> > > > > > > subreg:
> > > > > > > 
> > > > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > > > > > > 
> > > > > > > However this is not correct, because SET_DEST is the
> > > > > > > second
> > > > > > > register in
> > > > > > > a register pair, and SET_SRC is half of a vector register
> > > > > > > that
> > > > > > > overlaps
> > > > > > > the first register in the corresponding pair. So it looks
> > > > > > > as
> > > > > > > if
> > > > > > > mode
> > > > > > > needs to be considered there.
> > > > > > 
> > > > > > Yes.
> > > > > > 
> > > > > > > This helps:
> > > > > > > 
> > > > > > > --- a/gcc/rtlanal.c
> > > > > > > +++ b/gcc/rtlanal.c
> > > > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > > > > > >         return 0;
> > > > > > >        src = SUBREG_REG (src);
> > > > > > >        dst = SUBREG_REG (dst);
> > > > > > > +      if (GET_MODE (src) != GET_MODE (dst))
> > > > > > > +       return 0;
> > > > > > >      }
> > > > > > > 
> > > > > > > but I'm not sure whether I'm not missing something about
> > > > > > > subreg
> > > > > > > semantics in the first place.
> > > > > > 
> > > > > > You probably should just see if both modes are the same
> > > > > > number
> > > > > > of
> > > > > > hard
> > > > > > registers?  HARD_REGNO_NREGS.
> > > > > 
> > > > > I've refined my patch as follows:
> > > > > 
> > > > > --- a/gcc/rtlanal.c
> > > > > +++ b/gcc/rtlanal.c
> > > > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
> > > > >         return 0;
> > > > >        src = SUBREG_REG (src);
> > > > >        dst = SUBREG_REG (dst);
> > > > > +      if (REG_P (src) && HARD_REGISTER_P (src) && REG_P
> > > > > (dst)
> > > > > +         && HARD_REGISTER_P (dst)
> > > > > +         && hard_regno_nregs (REGNO (src), GET_MODE (src))
> > > > > +                != hard_regno_nregs (REGNO (dst), GET_MODE
> > > > > (dst)))
> > > > > +       return 0;
> > > > >      }
> > > > 
> > > > I think checking the mode would be safer.  Having the same
> > > > number
> > > > of registers doesn't mean that the bits are distributed across
> > > > the
> > > > registers in the same way.
> > > 
> > > Yeah, that's what I was trying to express with this hypothetical
> > > machine example.  On the other hand, checking mode is too
> > > pessimistic.
> > > E.g. if we talk about s390 GPRs, then considering
> > > 
> > > (set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4))
> > 
> > Sorry, bad example: here the hard register modes actually match.
> > But it's probably possible to come up with something similar, where
> > the hard reg is accessed with different modes, but when we add
> > subregs
> > on top, then we end up selecting the same bits.
> 
> Wel, in general, (subreg (reg …) …) shouldn't exist for hard
> registers.
> They should just be folded to the underlying (reg …) instead.
> So in the above I'd expect to see:
> 
>   (set (reg:SI %r0) (reg:SI %r0))
> 
> instead.  I remember there were cases on e.g. e500 where the target
> prevented the fold from taking place due to layout issues, but wasn't
> sure whether the above FP example above was something similar.
> 
> So…
> 
> > > a no-op is fine from my perspective.  So having a more
> > > restrictive
> > > check might be desirable.  Is there a way to ask the backend how
> > > the
> > > subreg bits are distributed?
> 
> …if the bits are distributed evenly in an obvious way, the hard
> register
> subreg should already have been folded to a reg.  I think subregs of
> hard registers are rare enough that we should just keep it simple
> and safe.

Ok! I'll make a proper patch out of this and post to gcc-patches then.

> 
> > > > Out of interest, why can't the subregs in the example above get
> > > > folded down to hard registers?
> > > 
> > > I think this is because the offsets are not 0.  I could imagine
> > > folding
> > > (subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a
> > > backend hook for this.  Does anything like this exist?  Also, can
> > > (subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply
> > > the second doubleword of 128-bit %v0 vector register.
> 
> For multi-register values with a regular layout,
> simplify_subreg_regno
> (via subreg_get_info) uses various target queries to work out which
> hard registers are actually being accessed.  Is the subreg above
> for big-endian or little-endian?  (I hope big-endian given the
> 8 offset.)

This is big-endian (s390).  Thanks for the info, register-related
target queries seem to be pretty extensive.  I still couldn't see
anything that would answer e.g. "what bits are accessed by
(reg:TF %f0)" with "bits 0-64 of %f0 and bits 0-64 of %f2", but that
level of detail is probably not needed anyway.

Reply via email to