[PATCH], Improve moving SFmode to GPR on PowerPC

Michael Meissner Tue, 19 Sep 2017 12:17:06 -0700

On the PowerPC starting with ISA 2.07 (power8), moving a single precision value
(SFmode) from a vector register to a GPR involves converting the scalar value
in the register from being in double (DFmode) format to the 32-bit
vector/storage format, doing the move to the GPR, and then doing a shift right
32-bits to get the value into the bottom 32-bits of the GPR for use as a
scalar:


        xscvdpspn 0,1
        mfvsrd    3,0
        srdi      3,3,32

It turns out that the current processors starting with ISA 2.06 (power7)
through ISA 3.0 (power9) actually duplicates the 32-bit value produced by the
XSCVDPSPN and XSCVDPSP instructions into the top 32-bits of the register and to
the second 32-bit word.  This allows us to eliminate the shift instruction,
since the value is already in the correct location for a 32-bit scalar.

ISA 3.0 is being updated to include this specification (and other fixes) so
that future processors will also be able to eliminate the shift.

The new code is:

        xscvdpspn 0,1
        mfvsrwz   3,0

While I was working on the modifications, I noticed that if the user did a
round from DFmode to SFmode and then tried to move it to a GPR, it would
originally do:

        frsp      1,2
        xscvdpspn 0,1
        mfvsrd    3,0
        srdi      3,3,32

The XSCVDPSP instruction already handles values outside of the SFmode range
(XSCVDPSPN does not), and so I added a combiner pattern to combine the two
instructions:

        xscvdpsp  0,1
        mfvsrwz   3,0

While I was looking at the code, I was noticing that if we have a SImode value
in a vector register, and we want to sign extended it and leave the value in a
GPR register, on power8 the register allocator would decide to do a 32-bit
store integer instruction and a sign extending load in the GPR to do the sign
extension.  I added a splitter to convert this into a pair of MFVSRWZ and
EXTSH instructions.

I built Spec 2006 with the changes, and I noticed the following changes in the
code:

    * Round DF->SF and move to GPR: namd, wrf;
    * Eliminate 32-bit shift: gromacs, namd, povray, wrf;
    * Use of MFVSRWZ/EXTSW: gromacs, povray, calculix, h264ref.

I have built these changes on the following machines with bootstrap and no
regressions in the regression test:

    * Big endian power7 (with both 32/64-bit targets);
    * Little endian power8;
    * Little endian power9 prototype.

Can I check these changes into GCC 8?  Can I back port these changes into the
GCC 7 branch?

[gcc]
2017-09-19  Michael Meissner  <meiss...@linux.vnet.ibm.com>

        * config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so it is
        next to vsx_xscvspdp.
        (vsx_xscvdpsp_scalar): Use 'ww' constraint instead of 'f' to allow
        SFmode values being in Altivec registers.
        (vsx_xscvdpspn): Eliminate uneeded alternative.  Use correct
        constraint ('ws') for DFmode.
        (vsx_xscvspdpn): Likewise.
        (vsx_xscvdpspn_scalar): Likewise.
        (peephole for optimizing move SF to GPR): Adjust code to eliminate
        needing to do the shift right 32-bits operation after XSCVDPSPN.
        * config/rs6000/rs6000.md (extendsi<mode>2): Add alternative to do
        sign extend from vector register to GPR via a split, preventing
        the register allocator from doing the move via store/load.
        (extendsi<mode>2 splitter): Likewise.
        (movsi_from_sf): Adjust code to eliminate doing a 32-bit shift
        right or vector extract after doing XSCVDPSPN.  Use MFVSRWZ
        instead of MFVSRD to move the value to a GPR register.
        (movdi_from_sf_zero_ext): Likewise.
        (movsi_from_df): Add optimization to merge a convert from DFmode
        to SFmode and moving the SFmode to a GPR to use XSCVDPSP instead
        of round and XSCVDPSPN.
        (reload_gpr_from_vsxsf): Use MFVSRWZ instead of MFVSRD to move the
        value to a GPR register.  Rename p8_mfvsrd_4_disf insn to
        p8_mfvsrwz_disf.
        (p8_mfvsrd_4_disf): Likewise.
        (p8_mfvsrwz_disf): Likewise.

[gcc/testsuite]
2017-09-19  Michael Meissner  <meiss...@linux.vnet.ibm.com>

        * gcc.target/powerpc/pr71977-1.c: Adjust scan-assembler codes to
        reflect that we don't generate a 32-bit shift right after
        XSCVDPSPN.
        * gcc.target/powerpc/direct-move-float1.c: Likewise.
        * gcc.target/powerpc/direct-move-float3.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md    (revision 252844)
+++ gcc/config/rs6000/vsx.md    (working copy)
@@ -1781,6 +1781,15 @@ (define_insn "vsx_xscvspdp"
   "xscvspdp %x0,%x1"
   [(set_attr "type" "fp")])
 
+;; Same as vsx_xscvspdp, but use SF as the type
+(define_insn "vsx_xscvspdp_scalar2"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=ww")
+       (unspec:SF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
+                  UNSPEC_VSX_CVSPDP))]
+  "VECTOR_UNIT_VSX_P (V4SFmode)"
+  "xscvspdp %x0,%x1"
+  [(set_attr "type" "fp")])
+
 ;; Generate xvcvhpsp instruction
 (define_insn "vsx_xvcvhpsp"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
@@ -1794,41 +1803,32 @@ (define_insn "vsx_xvcvhpsp"
 ;; format of scalars is actually DF.
 (define_insn "vsx_xscvdpsp_scalar"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
-       (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "f")]
+       (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww")]
                     UNSPEC_VSX_CVSPDP))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "xscvdpsp %x0,%x1"
   [(set_attr "type" "fp")])
 
-;; Same as vsx_xscvspdp, but use SF as the type
-(define_insn "vsx_xscvspdp_scalar2"
-  [(set (match_operand:SF 0 "vsx_register_operand" "=ww")
-       (unspec:SF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
-                  UNSPEC_VSX_CVSPDP))]
-  "VECTOR_UNIT_VSX_P (V4SFmode)"
-  "xscvspdp %x0,%x1"
-  [(set_attr "type" "fp")])
-
 ;; ISA 2.07 xscvdpspn/xscvspdpn that does not raise an error on signalling NaNs
 (define_insn "vsx_xscvdpspn"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww,?ww")
-       (unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "wd,wa")]
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww")
+       (unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "ws")]
                     UNSPEC_VSX_CVDPSPN))]
   "TARGET_XSCVDPSPN"
   "xscvdpspn %x0,%x1"
   [(set_attr "type" "fp")])
 
 (define_insn "vsx_xscvspdpn"
-  [(set (match_operand:DF 0 "vsx_register_operand" "=ws,?ws")
-       (unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wf,wa")]
+  [(set (match_operand:DF 0 "vsx_register_operand" "=ws")
+       (unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
                   UNSPEC_VSX_CVSPDPN))]
   "TARGET_XSCVSPDPN"
   "xscvspdpn %x0,%x1"
   [(set_attr "type" "fp")])
 
 (define_insn "vsx_xscvdpspn_scalar"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wf,?wa")
-       (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww,ww")]
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+       (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww")]
                     UNSPEC_VSX_CVDPSPN))]
   "TARGET_XSCVDPSPN"
   "xscvdpspn %x0,%x1"
@@ -4773,15 +4773,13 @@ (define_constants
 ;;
 ;; (set (reg:DI reg3) (unspec:DI [(reg:V4SF reg2)] UNSPEC_P8V_RELOAD_FROM_VSX))
 ;;
-;; (set (reg:DI reg3) (lshiftrt:DI (reg:DI reg3) (const_int 32)))
+;; (set (reg:DI reg4) (and:DI (reg:DI reg3) (reg:DI reg3)))
 ;;
-;; (set (reg:DI reg5) (and:DI (reg:DI reg3) (reg:DI reg4)))
+;; (set (reg:DI reg5) (ashift:DI (reg:DI reg4) (const_int 32)))
 ;;
-;; (set (reg:DI reg6) (ashift:DI (reg:DI reg5) (const_int 32)))
+;; (set (reg:SF reg6) (unspec:SF [(reg:DI reg5)] UNSPEC_P8V_MTVSRD))
 ;;
-;; (set (reg:SF reg7) (unspec:SF [(reg:DI reg6)] UNSPEC_P8V_MTVSRD))
-;;
-;; (set (reg:SF reg7) (unspec:SF [(reg:SF reg7)] UNSPEC_VSX_CVSPDPN))
+;; (set (reg:SF reg6) (unspec:SF [(reg:SF reg6)] UNSPEC_VSX_CVSPDPN))
 
 (define_peephole2
   [(match_scratch:DI SFBOOL_TMP_GPR "r")
@@ -4792,11 +4790,6 @@ (define_peephole2
        (unspec:DI [(match_operand:V4SF SFBOOL_MFVSR_A "vsx_register_operand")]
                   UNSPEC_P8V_RELOAD_FROM_VSX))
 
-   ;; SRDI
-   (set (match_dup SFBOOL_MFVSR_D)
-       (lshiftrt:DI (match_dup SFBOOL_MFVSR_D)
-                    (const_int 32)))
-
    ;; AND/IOR/XOR operation on int
    (set (match_operand:SI SFBOOL_BOOL_D "int_reg_operand")
        (and_ior_xor:SI (match_operand:SI SFBOOL_BOOL_A1 "int_reg_operand")
@@ -4820,15 +4813,15 @@ (define_peephole2
    && (REG_P (operands[SFBOOL_BOOL_A2])
        || CONST_INT_P (operands[SFBOOL_BOOL_A2]))
    && (REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_MFVSR_D])
-       || peep2_reg_dead_p (3, operands[SFBOOL_MFVSR_D]))
+       || peep2_reg_dead_p (2, operands[SFBOOL_MFVSR_D]))
    && (REGNO (operands[SFBOOL_MFVSR_D]) == REGNO (operands[SFBOOL_BOOL_A1])
        || (REG_P (operands[SFBOOL_BOOL_A2])
           && REGNO (operands[SFBOOL_MFVSR_D])
                == REGNO (operands[SFBOOL_BOOL_A2])))
    && REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_SHL_A])
    && (REGNO (operands[SFBOOL_SHL_D]) == REGNO (operands[SFBOOL_BOOL_D])
-       || peep2_reg_dead_p (4, operands[SFBOOL_BOOL_D]))
-   && peep2_reg_dead_p (5, operands[SFBOOL_SHL_D])"
+       || peep2_reg_dead_p (3, operands[SFBOOL_BOOL_D]))
+   && peep2_reg_dead_p (4, operands[SFBOOL_SHL_D])"
   [(set (match_dup SFBOOL_TMP_GPR)
        (ashift:DI (match_dup SFBOOL_BOOL_A_DI)
                   (const_int 32)))
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md (revision 252844)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -986,8 +986,11 @@ (define_insn_and_split "*extendhi<mode>2
 
 
 (define_insn "extendsi<mode>2"
-  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,wl,wu,wj,wK,wH")
-       (sign_extend:EXTSI (match_operand:SI 1 "lwa_operand" 
"Y,r,Z,Z,r,wK,wH")))]
+  [(set (match_operand:EXTSI 0 "gpc_reg_operand"
+                    "=r, r,   wl,    wu,    wj,    wK,     wH,    wr")
+
+       (sign_extend:EXTSI (match_operand:SI 1 "lwa_operand"
+                    "Y,  r,   Z,     Z,     r,     wK,     wH,    ?wIwH")))]
   ""
   "@
    lwa%U1%X1 %0,%1
@@ -996,10 +999,23 @@ (define_insn "extendsi<mode>2"
    lxsiwax %x0,%y1
    mtvsrwa %x0,%1
    vextsw2d %0,%1
+   #
    #"
-  [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm")
+  [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm,mftgpr")
    (set_attr "sign_extend" "yes")
-   (set_attr "length" "4,4,4,4,4,4,8")])
+   (set_attr "length" "4,4,4,4,4,4,8,8")])
+
+(define_split
+  [(set (match_operand:DI 0 "int_reg_operand")
+       (sign_extend:DI (match_operand:SI 1 "vsx_register_operand")))]
+  "TARGET_DIRECT_MOVE_64BIT && reload_completed"
+  [(set (match_dup 2)
+       (match_dup 1))
+   (set (match_dup 0)
+       (sign_extend:DI (match_dup 2)))]
+{
+  operands[2] = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+})
 
 (define_split
   [(set (match_operand:DI 0 "altivec_register_operand")
@@ -6790,25 +6806,25 @@ (define_insn "*movsi_internal1_single"
 ;; needed.
 
 ;;             MR           LWZ          LFIWZX       LXSIWZX   STW
-;;             STFS         STXSSP       STXSSPX      VSX->GPR  MTVSRWZ
-;;             VSX->VSX
+;;             STFS         STXSSP       STXSSPX      VSX->GPR  VSX->VSX,
+;;             MTVSRWZ
 
 (define_insn_and_split "movsi_from_sf"
   [(set (match_operand:SI 0 "nonimmediate_operand"
                "=r,         r,           ?*wI,        ?*wH,     m,
-                m,          wY,          Z,           r,        wIwH,
-                ?wK")
+                m,          wY,          Z,           r,        ?*wIwH,
+                wIwH")
 
        (unspec:SI [(match_operand:SF 1 "input_operand"
                "r,          m,           Z,           Z,        r,
-                f,          wb,          wu,          wIwH,     r,
-                wK")]
+                f,          wb,          wu,          wIwH,     wIwH,
+                r")]
                    UNSPEC_SI_FROM_SF))
 
    (clobber (match_scratch:V4SF 2
                "=X,         X,           X,           X,        X,
                 X,          X,           X,           wa,       X,
-                wa"))]
+                X"))]
 
   "TARGET_NO_SF_SUBREG
    && (register_operand (operands[0], SImode)
@@ -6823,10 +6839,10 @@ (define_insn_and_split "movsi_from_sf"
    stxssp %1,%0
    stxsspx %x1,%y0
    #
-   mtvsrwz %x0,%1
-   #"
+   xscvdpspn %x0,%x1
+   mtvsrwz %x0,%1"
   "&& reload_completed
-   && register_operand (operands[0], SImode)
+   && int_reg_operand (operands[0], SImode)
    && vsx_reg_sfsubreg_ok (operands[1], SFmode)"
   [(const_int 0)]
 {
@@ -6836,50 +6852,38 @@ (define_insn_and_split "movsi_from_sf"
   rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
 
   emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
-
-  if (int_reg_operand (op0, SImode))
-    {
-      emit_insn (gen_p8_mfvsrd_4_disf (op0_di, op2));
-      emit_insn (gen_lshrdi3 (op0_di, op0_di, GEN_INT (32)));
-    }
-  else
-    {
-      rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1));
-      rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12);
-      emit_insn (gen_vextract4b (op0_di, op1_v16qi, byte_off));
-    }
-
+  emit_insn (gen_p8_mfvsrwz_disf (op0_di, op2));
   DONE;
 }
   [(set_attr "type"
                "*,          load,        fpload,      fpload,   store,
-                fpstore,    fpstore,     fpstore,     mftgpr,   mffgpr,
-                veclogical")
+                fpstore,    fpstore,     fpstore,     mftgpr,   fp,
+                mffgpr")
 
    (set_attr "length"
                "4,          4,           4,           4,        4,
-                4,          4,           4,           12,       4,
-                8")])
+                4,          4,           4,           8,        4,
+                4")])
 
 ;; movsi_from_sf with zero extension
 ;;
 ;;             RLDICL       LWZ          LFIWZX       LXSIWZX   VSX->GPR
-;;             MTVSRWZ      VSX->VSX
+;;             VSX->VSX     MTVSRWZ
 
 (define_insn_and_split "*movdi_from_sf_zero_ext"
   [(set (match_operand:DI 0 "gpc_reg_operand"
                "=r,         r,           ?*wI,        ?*wH,     r,
-               wIwH,        ?wK")
+                wK,         wIwH")
 
        (zero_extend:DI
         (unspec:SI [(match_operand:SF 1 "input_operand"
                "r,          m,           Z,           Z,        wIwH,
-                r,          wK")]
+                wIwH,       r")]
                    UNSPEC_SI_FROM_SF)))
 
    (clobber (match_scratch:V4SF 2
                "=X,         X,           X,           X,        wa,
-                X,          wa"))]
+                X,          X"))]
 
   "TARGET_DIRECT_MOVE_64BIT
    && (register_operand (operands[0], DImode)
@@ -6890,9 +6894,10 @@ (define_insn_and_split "*movdi_from_sf_z
    lfiwzx %0,%y1
    lxsiwzx %x0,%y1
    #
-   mtvsrwz %x0,%1
-   #"
+   #
+   mtvsrwz %x0,%1"
   "&& reload_completed
+   && register_operand (operands[0], DImode)
    && vsx_reg_sfsubreg_ok (operands[1], SFmode)"
   [(const_int 0)]
 {
@@ -6901,29 +6906,43 @@ (define_insn_and_split "*movdi_from_sf_z
   rtx op2 = operands[2];
 
   emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
-
   if (int_reg_operand (op0, DImode))
-    {
-      emit_insn (gen_p8_mfvsrd_4_disf (op0, op2));
-      emit_insn (gen_lshrdi3 (op0, op0, GEN_INT (32)));
-    }
+    emit_insn (gen_p8_mfvsrwz_disf (op0, op2));
   else
     {
-      rtx op0_si = gen_rtx_REG (SImode, REGNO (op0));
-      rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1));
-      rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12);
-      emit_insn (gen_vextract4b (op0_si, op1_v16qi, byte_off));
+      rtx op2_si = gen_rtx_REG (SImode, reg_or_subregno (op2));
+      emit_insn (gen_zero_extendsidi2 (op0, op2_si));
     }
 
   DONE;
 }
   [(set_attr "type"
                "*,          load,        fpload,      fpload,  mftgpr,
-                mffgpr,     veclogical")
+                vecexts,    mffgpr")
 
    (set_attr "length"
-               "4,          4,           4,           4,        12,
-                4,          8")])
+               "4,          4,           4,           4,        8,
+                8,          4")])
+
+;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before
+;; moving it to SImode.  We can do a SFmode store without having to do the
+;; conversion explicitly.  If we are doing a register->register conversion, use
+;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the
+;; input will not fit in a SFmode, and the later assumes the value has already
+;; been rounded.
+(define_insn "*movsi_from_df"
+  [(set (match_operand:SI 0 "nonimmediate_operand"         "=wa,m,wY,Z")
+       (unspec:SI [(float_truncate:SF
+                    (match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))]
+                   UNSPEC_SI_FROM_SF))]
+
+  "TARGET_NO_SF_SUBREG"
+  "@
+   xscvdpsp %x0,%x1
+   stfs%U0%X0 %1,%0
+   stxssp %1,%0
+   stxsspx %x1,%y0"
+  [(set_attr "type"   "fp,fpstore,fpstore,fpstore")])
 
 ;; Split a load of a large constant into the appropriate two-insn
 ;; sequence.
@@ -8437,19 +8456,20 @@ (define_insn_and_split "reload_gpr_from_
   rtx diop0 = simplify_gen_subreg (DImode, op0, SFmode, 0);
 
   emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
-  emit_insn (gen_p8_mfvsrd_4_disf (diop0, op2));
-  emit_insn (gen_lshrdi3 (diop0, diop0, GEN_INT (32)));
+  emit_insn (gen_p8_mfvsrwz_disf (diop0, op2));
   DONE;
 }
   [(set_attr "length" "12")
    (set_attr "type" "three")])
 
-(define_insn "p8_mfvsrd_4_disf"
+;; XSCVDPSPN puts the 32-bit value in both the first and second words, so we do
+;; not need to do a shift to extract the value.
+(define_insn "p8_mfvsrwz_disf"
   [(set (match_operand:DI 0 "register_operand" "=r")
        (unspec:DI [(match_operand:V4SF 1 "register_operand" "wa")]
                   UNSPEC_P8V_RELOAD_FROM_VSX))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mfvsrd %0,%x1"
+  "mfvsrwz %0,%x1"
   [(set_attr "type" "mftgpr")])
 
 
Index: gcc/testsuite/gcc.target/powerpc/pr71977-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr71977-1.c        (revision 252844)
+++ gcc/testsuite/gcc.target/powerpc/pr71977-1.c        (working copy)
@@ -23,9 +23,9 @@ mask_and_float_var (float f, uint32_t ma
   return u.value;
 }
 
-/* { dg-final { scan-assembler     "\[ \t\]xxland " } } */
-/* { dg-final { scan-assembler-not "\[ \t\]and "    } } */
-/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */
-/* { dg-final { scan-assembler-not "\[ \t\]stxv"    } } */
-/* { dg-final { scan-assembler-not "\[ \t\]lxv"     } } */
-/* { dg-final { scan-assembler-not "\[ \t\]srdi "   } } */
+/* { dg-final { scan-assembler     {\mxxland\M}  } } */
+/* { dg-final { scan-assembler-not {\mand\M}     } } */
+/* { dg-final { scan-assembler-not {\mmfvsrwz\M} } } */
+/* { dg-final { scan-assembler-not {\mstxv\M}    } } */
+/* { dg-final { scan-assembler-not {\mlxv\M}     } } */
+/* { dg-final { scan-assembler-not {\msrdi\M}    } } */
Index: gcc/testsuite/gcc.target/powerpc/direct-move-float1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-float1.c       (revision 
252844)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-float1.c       (working copy)
@@ -5,7 +5,7 @@
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O2" } */
 /* { dg-final { scan-assembler "mtvsrd" } } */
-/* { dg-final { scan-assembler "mfvsrd" } } */
+/* { dg-final { scan-assembler "mfvsrwz" } } */
 /* { dg-final { scan-assembler "xscvdpspn" } } */
 /* { dg-final { scan-assembler "xscvspdpn" } } */
 
Index: gcc/testsuite/gcc.target/powerpc/direct-move-float3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-float3.c       (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-float3.c       (revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mpower8-vector -O2" } */
+
+/* Test that we generate XSCVDPSP instead of FRSP and XSCVDPSPN when we combine
+   a round from double to float and moving the float value to a GPR.  */
+
+union u {
+  float f;
+  unsigned int ui;
+  int si;
+};
+
+unsigned int
+ui_d (double d)
+{
+  union u x;
+  x.f = d;
+  return x.ui;
+}
+
+/* { dg-final { scan-assembler     {\mmfvsrwz\M}   } } */
+/* { dg-final { scan-assembler     {\mxscvdpsp\M}  } } */
+/* { dg-final { scan-assembler-not {\mmtvsrd\M}    } } */
+/* { dg-final { scan-assembler-not {\mxscvdpspn\M} } } */
+/* { dg-final { scan-assembler-not {\msrdi\M}      } } */

[PATCH], Improve moving SFmode to GPR on PowerPC

Reply via email to