On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote: > > On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote: > > > Another concern I had about this, besides using %L in asm output (what > > > forces TFmode to use just fprs?), is what happens when we're using > > > IEEE 128-bit floats? In that case it looks like we'd get just one reg. > > > > Good point that it breaks if the default long double (TFmode) type is IEEE > > 128-bit floating point. We would need to have two patterns, one that uses > > TFmode and one that uses IFmode. I wrote the power8 direct move stuff > > before > > going down the road of IEEE 128-bit floating point. > > Right. It's a bit unfortunate that we can't just use IFmode unconditionally, > but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and > then we probably shouldn't be using it.
Actually, we can use IFmode unconditionally. scalar_mode_supported_p is relevant only up to and including expand. Nothing prevents the backend from using IFmode. > Another option might be to use TDmode to allocate a scratch register pair. That won't work, at least if we want to extract the two component regs with simplify_gen_subreg, due to rs6000_cannot_change_mode_class. In my original patch I just extracted the regs by using gen_rtx_REG but I changed that, based on your criticism of using gen_rtx_REG in reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using operand regnos in other places. That particular change is of course entirely cosmetic. I also changed reload_vsx_from_gprsf to avoid mode punning regs, instead duplicating insn patterns as done elsewhere in the vsx support. I don't believe we will see subregs of vsx or fp regs after reload, but I'm quite willing to concede the point for a stage4 fix. Here's the revised patch. To recap, the main bug fixes here are: - stop reload_vsx_from_gprsf splitter from emitting a move not handled by movdi_internal64 - don't use TFmode, which cannot now be assumed to be IBM double-double. Secondary to that, not using or passing around TFmode means the %L restriction no longer matters, and constraints on the reload temp reg can be relaxed. Bootstrapped and regression tested powerpc64-linux biarch and powerpc64le-linux. OK David? PR target/68973 * config/rs6000/rs6000.md (reload_vsx_from_gprsf): Use p8_mtvsrd_sf rather than attempting to use movdi_internal64. Remove op0_di. (p8_mtvsrd_df, p8_mtvsrd_sf): New. (p8_mtvsrd_1, p8_mtvsrd_2): Delete. (p8_mtvsrwz): New. (p8_mtvsrwz_1, p8_mtvsrwz_2): Delete. (p8_xxpermdi_<mode>): Take two DF inputs rather than one TF. (p8_fmrgow_<mode>): Likewise. (reload_vsx_from_gpr<mode>): Make clobber IF. Adjust for above changes. (reload_fpr_from_gpr<mode>): Similarly. Use "d" for op0 constraint. * config/rs6000/vsx.md (vsx_xscvspdpn_directmove): Make op1 SFmode. diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index cdbf873..ec356cb 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -7488,41 +7488,31 @@ ;; value, since it is allocated in reload and not all of the flow information ;; is setup for it. We have two patterns to do the two moves between gprs and ;; fprs. There isn't a dependancy between the two, but we could potentially -;; schedule other instructions between the two instructions. TFmode is -;; currently limited to traditional FPR registers. If/when this is changed, we -;; will need to revist %L to make sure it works with VSX registers, or add an -;; %x version of %L. +;; schedule other instructions between the two instructions. (define_insn "p8_fmrgow_<mode>" [(set (match_operand:FMOVE64X 0 "register_operand" "=d") - (unspec:FMOVE64X [(match_operand:TF 1 "register_operand" "d")] + (unspec:FMOVE64X [ + (match_operand:DF 1 "register_operand" "d") + (match_operand:DF 2 "register_operand" "d")] UNSPEC_P8V_FMRGOW))] "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "fmrgow %0,%1,%L1" + "fmrgow %0,%1,%2" [(set_attr "type" "vecperm")]) -(define_insn "p8_mtvsrwz_1" - [(set (match_operand:TF 0 "register_operand" "=d") - (unspec:TF [(match_operand:SI 1 "register_operand" "r")] +(define_insn "p8_mtvsrwz" + [(set (match_operand:DF 0 "register_operand" "=d") + (unspec:DF [(match_operand:SI 1 "register_operand" "r")] UNSPEC_P8V_MTVSRWZ))] "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "mtvsrwz %x0,%1" [(set_attr "type" "mftgpr")]) -(define_insn "p8_mtvsrwz_2" - [(set (match_operand:TF 0 "register_operand" "+d") - (unspec:TF [(match_dup 0) - (match_operand:SI 1 "register_operand" "r")] - UNSPEC_P8V_MTVSRWZ))] - "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "mtvsrwz %L0,%1" - [(set_attr "type" "mftgpr")]) - (define_insn_and_split "reload_fpr_from_gpr<mode>" - [(set (match_operand:FMOVE64X 0 "register_operand" "=ws") + [(set (match_operand:FMOVE64X 0 "register_operand" "=d") (unspec:FMOVE64X [(match_operand:FMOVE64X 1 "register_operand" "r")] UNSPEC_P8V_RELOAD_FROM_GPR)) - (clobber (match_operand:TF 2 "register_operand" "=d"))] + (clobber (match_operand:IF 2 "register_operand" "=d"))] "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "#" "&& reload_completed" @@ -7530,42 +7520,36 @@ { rtx dest = operands[0]; rtx src = operands[1]; - rtx tmp = operands[2]; + rtx tmp_hi = simplify_gen_subreg (DFmode, operands[2], IFmode, 0); + rtx tmp_lo = simplify_gen_subreg (DFmode, operands[2], IFmode, 8); rtx gpr_hi_reg = gen_highpart (SImode, src); rtx gpr_lo_reg = gen_lowpart (SImode, src); - emit_insn (gen_p8_mtvsrwz_1 (tmp, gpr_hi_reg)); - emit_insn (gen_p8_mtvsrwz_2 (tmp, gpr_lo_reg)); - emit_insn (gen_p8_fmrgow_<mode> (dest, tmp)); + emit_insn (gen_p8_mtvsrwz (tmp_hi, gpr_hi_reg)); + emit_insn (gen_p8_mtvsrwz (tmp_lo, gpr_lo_reg)); + emit_insn (gen_p8_fmrgow_<mode> (dest, tmp_hi, tmp_lo)); DONE; } [(set_attr "length" "12") (set_attr "type" "three")]) ;; Move 128 bit values from GPRs to VSX registers in 64-bit mode -(define_insn "p8_mtvsrd_1" - [(set (match_operand:TF 0 "register_operand" "=ws") - (unspec:TF [(match_operand:DI 1 "register_operand" "r")] - UNSPEC_P8V_MTVSRD))] - "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "mtvsrd %0,%1" - [(set_attr "type" "mftgpr")]) - -(define_insn "p8_mtvsrd_2" - [(set (match_operand:TF 0 "register_operand" "+ws") - (unspec:TF [(match_dup 0) - (match_operand:DI 1 "register_operand" "r")] +(define_insn "p8_mtvsrd_df" + [(set (match_operand:DF 0 "register_operand" "=wa") + (unspec:DF [(match_operand:DI 1 "register_operand" "r")] UNSPEC_P8V_MTVSRD))] "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "mtvsrd %L0,%1" + "mtvsrd %x0,%1" [(set_attr "type" "mftgpr")]) (define_insn "p8_xxpermdi_<mode>" [(set (match_operand:FMOVE128_GPR 0 "register_operand" "=wa") - (unspec:FMOVE128_GPR [(match_operand:TF 1 "register_operand" "ws")] - UNSPEC_P8V_XXPERMDI))] + (unspec:FMOVE128_GPR [ + (match_operand:DF 1 "register_operand" "wa") + (match_operand:DF 2 "register_operand" "wa")] + UNSPEC_P8V_XXPERMDI))] "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "xxpermdi %x0,%1,%L1,0" + "xxpermdi %x0,%x1,%x2,0" [(set_attr "type" "vecperm")]) (define_insn_and_split "reload_vsx_from_gpr<mode>" @@ -7573,7 +7557,7 @@ (unspec:FMOVE128_GPR [(match_operand:FMOVE128_GPR 1 "register_operand" "r")] UNSPEC_P8V_RELOAD_FROM_GPR)) - (clobber (match_operand:TF 2 "register_operand" "=ws"))] + (clobber (match_operand:IF 2 "register_operand" "=wa"))] "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "#" "&& reload_completed" @@ -7581,13 +7565,17 @@ { rtx dest = operands[0]; rtx src = operands[1]; - rtx tmp = operands[2]; + /* You might think that we could use op0 as one temp and a DF clobber + as op2, but you'd be wrong. Secondary reload move patterns don't + check for overlap of the clobber and the destination. */ + rtx tmp_hi = simplify_gen_subreg (DFmode, operands[2], IFmode, 0); + rtx tmp_lo = simplify_gen_subreg (DFmode, operands[2], IFmode, 8); rtx gpr_hi_reg = gen_highpart (DImode, src); rtx gpr_lo_reg = gen_lowpart (DImode, src); - emit_insn (gen_p8_mtvsrd_1 (tmp, gpr_hi_reg)); - emit_insn (gen_p8_mtvsrd_2 (tmp, gpr_lo_reg)); - emit_insn (gen_p8_xxpermdi_<mode> (dest, tmp)); + emit_insn (gen_p8_mtvsrd_df (tmp_hi, gpr_hi_reg)); + emit_insn (gen_p8_mtvsrd_df (tmp_lo, gpr_lo_reg)); + emit_insn (gen_p8_xxpermdi_<mode> (dest, tmp_hi, tmp_lo)); DONE; } [(set_attr "length" "12") @@ -7608,6 +7596,13 @@ ;; Move SFmode to a VSX from a GPR register. Because scalar floating point ;; type is stored internally as double precision in the VSX registers, we have ;; to convert it from the vector format. +(define_insn "p8_mtvsrd_sf" + [(set (match_operand:SF 0 "register_operand" "=wa") + (unspec:SF [(match_operand:DI 1 "register_operand" "r")] + UNSPEC_P8V_MTVSRD))] + "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" + "mtvsrd %x0,%1" + [(set_attr "type" "mftgpr")]) (define_insn_and_split "reload_vsx_from_gprsf" [(set (match_operand:SF 0 "register_operand" "=wa") @@ -7622,16 +7617,12 @@ rtx op0 = operands[0]; rtx op1 = operands[1]; rtx op2 = operands[2]; - /* Also use the destination register to hold the unconverted DImode value. - This is conceptually a separate value from OP0, so we use gen_rtx_REG - rather than simplify_gen_subreg. */ - rtx op0_di = gen_rtx_REG (DImode, REGNO (op0)); rtx op1_di = simplify_gen_subreg (DImode, op1, SFmode, 0); /* Move SF value to upper 32-bits for xscvspdpn. */ emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32))); - emit_move_insn (op0_di, op2); - emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di)); + emit_insn (gen_p8_mtvsrd_sf (op0, op2)); + emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0)); DONE; } [(set_attr "length" "8") diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 997ff31..45af233 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -1521,7 +1521,7 @@ ;; Used by direct move to move a SFmode value from GPR to VSX register (define_insn "vsx_xscvspdpn_directmove" [(set (match_operand:SF 0 "vsx_register_operand" "=wa") - (unspec:SF [(match_operand:DI 1 "vsx_register_operand" "wa")] + (unspec:SF [(match_operand:SF 1 "vsx_register_operand" "wa")] UNSPEC_VSX_CVSPDPN))] "TARGET_XSCVSPDPN" "xscvspdpn %x0,%x1" -- Alan Modra Australia Development Lab, IBM