On Tue, Aug 29, 2017 at 06:14:37AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Mon, Aug 28, 2017 at 02:50:02PM -0400, Michael Meissner wrote:
> > When I added the optimization for loading 32-bit values directly into the
> > vector registers from memory to convert to IEEE 128-bit floating point, I
> > forgot to make sure the address did not have PRE_INCREMENT, etc. addressing.
> 
> >     * config/rs6000/rs6000.md (float_<mode>si2_hw): If register
> >     allocation hasn't been done, make sure the memory address is
> >     X-FORM (register+register).
> >     (floatuns_<mode>si2_hw2): Likewise.
> 
> Why is it okay after RA but not before?

Because the Z constraint forces the RA to fix the address.  Before RA, the
address is:

        (MEM:SI (PRE_INC:DI (REG:DI ...)))

When the insn is split before RA, the split insn is no longer valid.  I could
either have made the split test to be after RA (which would have moved the
increment outside of the address) or generated a new pseudo to fix up the
address.

> > --- gcc/config/rs6000/rs6000.md     (revision 251358)
> > +++ gcc/config/rs6000/rs6000.md     (working copy)
> > @@ -14505,6 +14505,9 @@ (define_insn_and_split "float_<mode>si2_
> >  {
> >    if (GET_CODE (operands[2]) == SCRATCH)
> >      operands[2] = gen_reg_rtx (DImode);
> > +
> > +  if (MEM_P (operands[1]) && !reload_completed)
> > +    operands[1] = rs6000_address_for_fpconvert (operands[1]);
> >  })
> 
> It will need a comment here, then (other callers of
> rs6000_address_for_fpconvert do not test for !reload_completed).

rs6000_address_for_fpconvert cannot be called after RA because it allocates a
pseudo in some cases.  In theory the other cases where it is used is either in
the define_expands or in define_splits before RA.  I can certainly add a
comment.  You don't necessary need a gcc_assert in rs6000_address_for_fpconvert
since creating the pseudo will cause a fatal.

> Or maybe the predicate should be stricter in all these cases?
> nonimmediate_operand allows a lot ;-)

It is one of those balancing acts, if you make it too strict early on, you
either don't get the optimization because it doesn't match during expand or
combine, or you potentially get insn not found messages.

> > --- gcc/testsuite/gcc.target/powerpc/pr81959.c      (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/pr81959.c      (revision 0)
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> > +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-options "-mpower9-vector -O2 -mfloat128" } */
> 
> powerpc*-*-*, or does that not work?

As we were discussing on private IRC yesterday, in order to enable the ISA 3.0
IEEE 128-bit floating point instructions you need support for TImode, and
TImode is not supported on 32-bit.  Since the code only fails when converting
SImode to KFmode, you need the && lp64.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Reply via email to