On Tue, Aug 29, 2017 at 06:14:37AM -0500, Segher Boessenkool wrote: > Hi Mike, > > On Mon, Aug 28, 2017 at 02:50:02PM -0400, Michael Meissner wrote: > > When I added the optimization for loading 32-bit values directly into the > > vector registers from memory to convert to IEEE 128-bit floating point, I > > forgot to make sure the address did not have PRE_INCREMENT, etc. addressing. > > > * config/rs6000/rs6000.md (float_<mode>si2_hw): If register > > allocation hasn't been done, make sure the memory address is > > X-FORM (register+register). > > (floatuns_<mode>si2_hw2): Likewise. > > Why is it okay after RA but not before?
Because the Z constraint forces the RA to fix the address. Before RA, the address is: (MEM:SI (PRE_INC:DI (REG:DI ...))) When the insn is split before RA, the split insn is no longer valid. I could either have made the split test to be after RA (which would have moved the increment outside of the address) or generated a new pseudo to fix up the address. > > --- gcc/config/rs6000/rs6000.md (revision 251358) > > +++ gcc/config/rs6000/rs6000.md (working copy) > > @@ -14505,6 +14505,9 @@ (define_insn_and_split "float_<mode>si2_ > > { > > if (GET_CODE (operands[2]) == SCRATCH) > > operands[2] = gen_reg_rtx (DImode); > > + > > + if (MEM_P (operands[1]) && !reload_completed) > > + operands[1] = rs6000_address_for_fpconvert (operands[1]); > > }) > > It will need a comment here, then (other callers of > rs6000_address_for_fpconvert do not test for !reload_completed). rs6000_address_for_fpconvert cannot be called after RA because it allocates a pseudo in some cases. In theory the other cases where it is used is either in the define_expands or in define_splits before RA. I can certainly add a comment. You don't necessary need a gcc_assert in rs6000_address_for_fpconvert since creating the pseudo will cause a fatal. > Or maybe the predicate should be stricter in all these cases? > nonimmediate_operand allows a lot ;-) It is one of those balancing acts, if you make it too strict early on, you either don't get the optimization because it doesn't match during expand or combine, or you potentially get insn not found messages. > > --- gcc/testsuite/gcc.target/powerpc/pr81959.c (revision 0) > > +++ gcc/testsuite/gcc.target/powerpc/pr81959.c (revision 0) > > @@ -0,0 +1,25 @@ > > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ > > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > > +/* { dg-options "-mpower9-vector -O2 -mfloat128" } */ > > powerpc*-*-*, or does that not work? As we were discussing on private IRC yesterday, in order to enable the ISA 3.0 IEEE 128-bit floating point instructions you need support for TImode, and TImode is not supported on 32-bit. Since the code only fails when converting SImode to KFmode, you need the && lp64. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797