PowerPC GCC has traditionally only allowed DImode to go into FPR registers (and now VSX registers) in order to allow floating point conversions. Conversions to/from SImode have always had to deal with special UNSPECs to allow the generation of the LFIWAX, LXSIWAX, LFIWZX, LXSIWZX, STFIWX, and STXSIWX instructions, since SImode was not allowed in the registers.
This patch adds support for ISA 2.07 (power8) and above to allow SImode values in the vector registers. It adds a new debug switch (-mvsx-small-integer) that can turn off this support. I have built a boostrap build on a little endian 64-bit Power8 computer, and a big endian 64-bit Power8 compiler, and both runs compiled and had no regression in the test suite. I have started a big endian 64-bit Power7 build right now that supports both 32-bit and 64-bit calls. If the power7 build finishes without regressions, can I check in this patch? In addition, I did a full Spec 2006 run with an earlier version of the patch that did not make -mvsx-small-integer default using an explicit switch. All 29 benchmarks in the 2006 CPU suite ran. The 436.cactusADM benchmark had a 3% improvement with -mvsx-small-integer, and all of the other benchmarks were performance neutral. The difference is it eliminates some load/store of GPRs and move directs in favor of doing the load/store 32-bit integer instructions to the FPRs. [gcc] 2016-10-26 Michael Meissner <meiss...@linux.vnet.ibm.com> * config/rs6000/constraints.md (wH constraint): Add new constraints for allowing 32-bit integers (and eventually 8/16-bit integers) into the vector registers. (wI constraint): Likewise. (wJ constraint): Likewise. (wK constraint): Likewise. * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Add -mvsx-small-integer as a default option for ISA 2.07 (i.e. power8). (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.opt (-mvsx-small-integer): Add new debug switch to turn off small integer support in vector registers. * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Eliminate test for -mupper-regs-di, since it is already done with the reg_add[mode].scalar_in_vsx_p. Add support for the switch -mvsx-small-integer. (rs6000_debug_reg_global): Add support for wH, wI, wJ, and wK constraints. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Likewise. (rs6000_option_override_internal): Add consistency checks for -mvsx-small-integer. (rs6000_secondary_reload_simple_move): SImode is a simple move if -mvsx-small-integer. (rs6000_secondary_reload): Use std::swap. (rs6000_preferred_reload_class): Don't prefer FLOAT_REGS over VSX_REGS for small integers in vector registers, since there is no D-FORM address mode for such types. (rs6000_register_move_cost): Use FIRST_FPR_REGNO instead of 32. (rs6000_opt_masks): Add -mvsx-small-integer. * config/rs6000/vsx.md (VSINT_84): Add SImode for small integer support. (VSX_EXTRACT_I2): Clone VSX_EXTRACT_I, but drop V4SI since SImode extracts can be done on ISA 2.07. (vsx_extract_<mode>): Add support for small integers in vsx registers. (vsx_extract_<mode>_p9): Use 'v' instead of VSX_EX, since we no longer support V4SImode in this pattern. (vsx_extract_si): New insn to support extraction of SImode in ISA 2.07 using either xxextractuw or vspltw. (vsx_extract_<mode>_p8): Use 'v' instead of VSX_EX, since we no longer support V4SImode in this pattern. * config/rs6000/rs6000.h (enum rs6000_reg_class_enum): Add wH, wI, wJ, and wK constraints. * config/rs6000/rs6000.md (f32_sv): Use correct instruction for storing SDmode with VSX instructions. (zero_extendsi<mode>2): Reorder pattern, so RLDICL comes before the FPR and VSX loads, but before MTVSRWZ. Remove ??, ! from the constraints. Add MFVSRWZ and XXEXTRACTUW instructions to support small integers in vector registers. (extendsi<mode>2): Reorder pattern, so EXTSW comes before the FPR and VSX loads, but before MTVSRWA. Remove ??, ! from the constraints. Add VEXTSW2D support for small integers in vector registers. (lfiwax): Remove ! constraint. Add VEXTSW2D support for small integers in vector registers. (floatsi<mode>2_lfiwax): If -mvsx-small-integer issue a normal move instead of using an UNSPEC. (lfiwzx): Remove ! constraint. Add XXEXTRACTUW support for small integers in vector registers. (floatunssi<mode>2_lfiwzx): If -mvsx-small-integer issue a normal move instead of using an UNSPEC. (movsi_internal1): Add support for -mvsx-small-integer. Align columns so that it is more readable. (SImode splitter for ISA 3.0 constants): Add splitter for -128..127 constants that can easily be constructed on ISA 3.0. * doc/md.texi (PowerPC Constraints): Document wH, wI, wJ, and wK constraints. [gcc/testsuite] 2016-10-26 Michael Meissner <meiss...@linux.vnet.ibm.com> * gcc.target/powerpc/vsx-simode.c: New test. * gcc.target/powerpc/vsx-simode2.c: Likewise. * gcc.target/powerpc/vsx-simode3.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md =================================================================== --- gcc/config/rs6000/constraints.md (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 241274) +++ gcc/config/rs6000/constraints.md (.../gcc/config/rs6000) (working copy) @@ -159,6 +159,18 @@ (define_memory_constraint "wG" "Memory operand suitable for TOC fusion memory references" (match_operand 0 "toc_fusion_mem_wrapped")) +(define_register_constraint "wH" "rs6000_constraints[RS6000_CONSTRAINT_wH]" + "Altivec register to hold 32-bit integers or NO_REGS.") + +(define_register_constraint "wI" "rs6000_constraints[RS6000_CONSTRAINT_wI]" + "FPR register to hold 32-bit integers or NO_REGS.") + +(define_register_constraint "wJ" "rs6000_constraints[RS6000_CONSTRAINT_wJ]" + "FPR register to hold 8/16-bit integers or NO_REGS.") + +(define_register_constraint "wK" "rs6000_constraints[RS6000_CONSTRAINT_wK]" + "Altivec register to hold 8/16-bit integers or NO_REGS.") + (define_constraint "wL" "Int constant that is the element number mfvsrld accesses in a vector." (and (match_code "const_int") Index: gcc/config/rs6000/rs6000-cpus.def =================================================================== --- gcc/config/rs6000/rs6000-cpus.def (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 241274) +++ gcc/config/rs6000/rs6000-cpus.def (.../gcc/config/rs6000) (working copy) @@ -58,7 +58,8 @@ | OPTION_MASK_HTM \ | OPTION_MASK_QUAD_MEMORY \ | OPTION_MASK_QUAD_MEMORY_ATOMIC \ - | OPTION_MASK_UPPER_REGS_SF) + | OPTION_MASK_UPPER_REGS_SF \ + | OPTION_MASK_VSX_SMALL_INTEGER) /* Add ISEL back into ISA 3.0, since it is supposed to be a win. Do not add P9_MINMAX until the hardware that supports it is available. Do not add @@ -138,6 +139,7 @@ | OPTION_MASK_UPPER_REGS_DF \ | OPTION_MASK_UPPER_REGS_SF \ | OPTION_MASK_VSX \ + | OPTION_MASK_VSX_SMALL_INTEGER \ | OPTION_MASK_VSX_TIMODE) #endif Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 241274) +++ gcc/config/rs6000/rs6000.opt (.../gcc/config/rs6000) (working copy) @@ -664,3 +664,7 @@ Enable using IEEE 128-bit floating point mfloat128-convert Target Undocumented Mask(FLOAT128_CVT) Var(rs6000_isa_flags) Enable default conversions between __float128 & long double. + +mvsx-small-integer +Target Report Mask(VSX_SMALL_INTEGER) Var(rs6000_isa_flags) +Enable small integers to be in VSX registers. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 241274) +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) @@ -1977,8 +1977,7 @@ rs6000_hard_regno_mode_ok (int regno, ma || FLOAT128_VECTOR_P (mode) || reg_addr[mode].scalar_in_vmx_p || (TARGET_VSX_TIMODE && mode == TImode) - || (TARGET_VADDUQM && mode == V1TImode) - || (TARGET_UPPER_REGS_DI && mode == DImode))) + || (TARGET_VADDUQM && mode == V1TImode))) { if (FP_REGNO_P (regno)) return FP_REGNO_P (last_regno); @@ -2009,9 +2008,14 @@ rs6000_hard_regno_mode_ok (int regno, ma && FP_REGNO_P (last_regno)) return 1; - if (GET_MODE_CLASS (mode) == MODE_INT - && GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD) - return 1; + if (GET_MODE_CLASS (mode) == MODE_INT) + { + if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD) + return 1; + + if (TARGET_VSX_SMALL_INTEGER && mode == SImode) + return 1; + } if (PAIRED_SIMD_REGNO_P (regno) && TARGET_PAIRED_FLOAT && PAIRED_VECTOR_MODE (mode)) @@ -2444,6 +2448,10 @@ rs6000_debug_reg_global (void) "wx reg_class = %s\n" "wy reg_class = %s\n" "wz reg_class = %s\n" + "wH reg_class = %s\n" + "wI reg_class = %s\n" + "wJ reg_class = %s\n" + "wK reg_class = %s\n" "\n", reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]], @@ -2471,7 +2479,11 @@ rs6000_debug_reg_global (void) reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_ww]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wx]], reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wy]], - reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wz]]); + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wz]], + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wH]], + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wI]], + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wJ]], + reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wK]]); nl = "\n"; for (m = 0; m < NUM_MACHINE_MODES; ++m) @@ -2767,6 +2779,7 @@ rs6000_setup_reg_addr_masks (void) { machine_mode m2 = (machine_mode) m; bool complex_p = false; + bool small_int_p = (m2 == QImode || m2 == HImode || m2 == SImode); size_t msize; if (COMPLEX_MODE_P (m2)) @@ -2791,13 +2804,20 @@ rs6000_setup_reg_addr_masks (void) /* Can mode values go in the GPR/FPR/Altivec registers? */ if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg]) { + bool small_int_vsx_p = (small_int_p + && (rc == RELOAD_REG_FPR + || rc == RELOAD_REG_VMX)); + nregs = rs6000_hard_regno_nregs[m][reg]; addr_mask |= RELOAD_REG_VALID; /* Indicate if the mode takes more than 1 physical register. If it takes a single register, indicate it can do REG+REG - addressing. */ - if (nregs > 1 || m == BLKmode || complex_p) + addressing. Small integers in VSX registers can only do + REG+REG addressing. */ + if (small_int_vsx_p) + addr_mask |= RELOAD_REG_INDEXED; + else if (nregs > 1 || m == BLKmode || complex_p) addr_mask |= RELOAD_REG_MULTIPLE; else addr_mask |= RELOAD_REG_INDEXED; @@ -2814,6 +2834,7 @@ rs6000_setup_reg_addr_masks (void) && !VECTOR_MODE_P (m2) && !FLOAT128_VECTOR_P (m2) && !complex_p + && !small_int_vsx_p && (m2 != DFmode || !TARGET_UPPER_REGS_DF) && (m2 != SFmode || !TARGET_UPPER_REGS_SF) && !(TARGET_E500_DOUBLE && msize == 8)) @@ -3112,7 +3133,10 @@ rs6000_init_hard_regno_mode_ok (bool glo ww - Register class to do SF conversions in with VSX operations. wx - Float register if we can do 32-bit int stores. wy - Register class to do ISA 2.07 SF operations. - wz - Float register if we can do 32-bit unsigned int loads. */ + wz - Float register if we can do 32-bit unsigned int loads. + wI - VSX register if SImode is allowed in VSX registers. + wJ - VSX register if QImode/HImode are allowed in VSX registers. + wK - Altivec register if QImode/HImode are allowed in VSX registers. */ if (TARGET_HARD_FLOAT && TARGET_FPRS) rs6000_constraints[RS6000_CONSTRAINT_f] = FLOAT_REGS; /* SFmode */ @@ -3206,6 +3230,18 @@ rs6000_init_hard_regno_mode_ok (bool glo if (TARGET_DIRECT_MOVE_128) rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS; + /* Support small integers in VSX registers. */ + if (TARGET_VSX_SMALL_INTEGER) + { + rs6000_constraints[RS6000_CONSTRAINT_wH] = ALTIVEC_REGS; + rs6000_constraints[RS6000_CONSTRAINT_wI] = FLOAT_REGS; + if (TARGET_P9_VECTOR) + { + rs6000_constraints[RS6000_CONSTRAINT_wJ] = FLOAT_REGS; + rs6000_constraints[RS6000_CONSTRAINT_wK] = ALTIVEC_REGS; + } + } + /* Set up the reload helper and direct move functions. */ if (TARGET_VSX || TARGET_ALTIVEC) { @@ -3358,6 +3394,9 @@ rs6000_init_hard_regno_mode_ok (bool glo if (TARGET_UPPER_REGS_SF) reg_addr[SFmode].scalar_in_vmx_p = true; + + if (TARGET_VSX_SMALL_INTEGER) + reg_addr[SImode].scalar_in_vmx_p = true; } /* Setup the fusion operations. */ @@ -4430,6 +4469,20 @@ rs6000_option_override_internal (bool gl } } + /* Check whether we should allow small integers into VSX registers. We + require direct move to prevent the register allocator from having to move + variables through memory to do moves. SImode can be used on ISA 2.07, + while HImode and QImode require ISA 3.0. */ + if (TARGET_VSX_SMALL_INTEGER + && (!TARGET_DIRECT_MOVE || !TARGET_P8_VECTOR || !TARGET_UPPER_REGS_DI)) + { + if (rs6000_isa_flags_explicit & OPTION_MASK_VSX_SMALL_INTEGER) + error ("-mvsx-small-integer requires -mpower8-vector, " + "-mupper-regs-di, and -mdirect-move"); + + rs6000_isa_flags &= ~OPTION_MASK_VSX_SMALL_INTEGER; + } + /* Set long double size before the IEEE 128-bit tests. */ if (!global_options_set.x_rs6000_long_double_type_size) { @@ -20363,32 +20416,46 @@ rs6000_secondary_reload_simple_move (enu enum rs6000_reg_type from_type, machine_mode mode) { - int size; + int size = GET_MODE_SIZE (mode); /* Add support for various direct moves available. In this function, we only look at cases where we don't need any extra registers, and one or more - simple move insns are issued. At present, 32-bit integers are not allowed + simple move insns are issued. Originally small integers are not allowed in FPR/VSX registers. Single precision binary floating is not a simple move because we need to convert to the single precision memory layout. The 4-byte SDmode can be moved. TDmode values are disallowed since they need special direct move handling, which we do not support yet. */ - size = GET_MODE_SIZE (mode); if (TARGET_DIRECT_MOVE - && ((mode == SDmode) || (TARGET_POWERPC64 && size == 8)) && ((to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE) || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE))) - return true; + { + if (TARGET_POWERPC64) + { + /* ISA 2.07: MTVSRD or MVFVSRD. */ + if (size == 8) + return true; - else if (TARGET_DIRECT_MOVE_128 && size == 16 && mode != TDmode - && ((to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE) - || (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE))) - return true; + /* ISA 3.0: MTVSRDD or MFVSRD + MFVSRLD. */ + if (size == 16 && TARGET_P9_VECTOR && mode != TDmode) + return true; + } + + /* ISA 2.07: MTVSRWZ or MFVSRWZ. */ + if (TARGET_VSX_SMALL_INTEGER && mode == SImode) + return true; + + /* ISA 2.07: MTVSRWZ or MFVSRWZ. */ + if (mode == SDmode) + return true; + } + /* Power6+: MFTGPR or MFFGPR. */ else if (TARGET_MFPGPR && TARGET_POWERPC64 && size == 8 - && ((to_type == GPR_REG_TYPE && from_type == FPR_REG_TYPE) - || (to_type == FPR_REG_TYPE && from_type == GPR_REG_TYPE))) + && ((to_type == GPR_REG_TYPE && from_type == FPR_REG_TYPE) + || (to_type == FPR_REG_TYPE && from_type == GPR_REG_TYPE))) return true; + /* Move to/from SPR. */ else if ((size == 4 || (TARGET_POWERPC64 && size == 8)) && ((to_type == GPR_REG_TYPE && from_type == SPR_REG_TYPE) || (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE))) @@ -20564,11 +20631,7 @@ rs6000_secondary_reload (bool in_p, enum rs6000_reg_type from_type = register_to_reg_type (x, &altivec_p); if (!in_p) - { - enum rs6000_reg_type exchange = to_type; - to_type = from_type; - from_type = exchange; - } + std::swap (to_type, from_type); /* Can we do a direct move of some sort? */ if (rs6000_secondary_reload_move (to_type, from_type, mode, sri, @@ -21196,7 +21259,8 @@ rs6000_preferred_reload_class (rtx x, en /* If this is a scalar floating point value and we don't have D-form addressing, prefer the traditional floating point registers so that we can use D-form (register+offset) addressing. */ - if (GET_MODE_SIZE (mode) < 16 && rclass == VSX_REGS) + if (rclass == VSX_REGS + && (mode == SFmode || GET_MODE_SIZE (mode) == 8)) return FLOAT_REGS; /* Prefer the Altivec registers if Altivec is handling the vector @@ -35767,7 +35831,7 @@ rs6000_register_move_cost (machine_mode else if (VECTOR_MEM_VSX_P (mode) && reg_classes_intersect_p (to, VSX_REGS) && reg_classes_intersect_p (from, VSX_REGS)) - ret = 2 * hard_regno_nregs[32][mode]; + ret = 2 * hard_regno_nregs[FIRST_FPR_REGNO][mode]; /* Moving between two similar registers is just one instruction. */ else if (reg_classes_intersect_p (to, from)) @@ -37373,6 +37437,7 @@ static struct rs6000_opt_mask const rs60 { "upper-regs-df", OPTION_MASK_UPPER_REGS_DF, false, true }, { "upper-regs-sf", OPTION_MASK_UPPER_REGS_SF, false, true }, { "vsx", OPTION_MASK_VSX, false, true }, + { "vsx-small-integer", OPTION_MASK_VSX_SMALL_INTEGER, false, true }, { "vsx-timode", OPTION_MASK_VSX_TIMODE, false, true }, #ifdef OPTION_MASK_64BIT #if TARGET_AIX_OS Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 241274) +++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000) (working copy) @@ -263,11 +263,14 @@ (define_mode_attr VS_64reg [(V2DF "ws") (V2DI "wi")]) ;; Iterators for loading constants with xxspltib -(define_mode_iterator VSINT_84 [V4SI V2DI DI]) +(define_mode_iterator VSINT_84 [V4SI V2DI DI SI]) (define_mode_iterator VSINT_842 [V8HI V4SI V2DI]) -;; Iterator for ISA 3.0 vector extract/insert of integer vectors -(define_mode_iterator VSX_EXTRACT_I [V16QI V8HI V4SI]) +;; Iterator for ISA 3.0 vector extract/insert of small integer vectors. +;; VSX_EXTRACT_I2 doesn't include V4SImode because SI extracts can be +;; done on ISA 2.07 and not just ISA 3.0. +(define_mode_iterator VSX_EXTRACT_I [V16QI V8HI V4SI]) +(define_mode_iterator VSX_EXTRACT_I2 [V16QI V8HI]) ;; Mode attribute to give the correct predicate for ISA 3.0 vector extract and ;; insert to validate the operand number. @@ -2476,7 +2479,9 @@ (define_expand "vsx_extract_<mode>" (clobber (match_dup 3))])] "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT" { - operands[3] = gen_rtx_SCRATCH ((TARGET_VEXTRACTUB) ? DImode : <MODE>mode); + machine_mode smode = ((<MODE>mode != V4SImode && TARGET_VEXTRACTUB) + ? DImode : <MODE>mode); + operands[3] = gen_rtx_SCRATCH (smode); }) ;; Under ISA 3.0, we can use the byte/half-word/word integer stores if we are @@ -2485,9 +2490,9 @@ (define_expand "vsx_extract_<mode>" (define_insn_and_split "*vsx_extract_<mode>_p9" [(set (match_operand:<VS_scalar> 0 "nonimmediate_operand" "=r,Z") (vec_select:<VS_scalar> - (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "<VSX_EX>,<VSX_EX>") + (match_operand:VSX_EXTRACT_I2 1 "gpc_reg_operand" "v,v") (parallel [(match_operand:QI 2 "<VSX_EXTRACT_PREDICATE>" "n,n")]))) - (clobber (match_scratch:DI 3 "=<VSX_EX>,<VSX_EX>"))] + (clobber (match_scratch:DI 3 "=v,v"))] "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_VEXTRACTUB" "#" "&& (reload_completed || MEM_P (operands[0]))" @@ -2516,8 +2521,6 @@ (define_insn_and_split "*vsx_extract_<m emit_insn (gen_p9_stxsibx (dest, di_tmp)); else if (<MODE>mode == V8HImode) emit_insn (gen_p9_stxsihx (dest, di_tmp)); - else if (<MODE>mode == V4SImode) - emit_insn (gen_stfiwx (dest, di_tmp)); else gcc_unreachable (); } @@ -2550,12 +2553,70 @@ (define_insn "vsx_extract_<mode>_di" } [(set_attr "type" "vecsimple")]) +(define_insn_and_split "*vsx_extract_si" + [(set (match_operand:SI 0 "nonimmediate_operand" "=r,Z,Z,wJwK") + (vec_select:SI + (match_operand:V4SI 1 "gpc_reg_operand" "v,wJwK,v,v") + (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")]))) + (clobber (match_scratch:V4SI 3 "=v,wJwK,v,v"))] + "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx dest = operands[0]; + rtx src = operands[1]; + rtx element = operands[2]; + rtx vec_tmp = operands[3]; + int value; + + if (!VECTOR_ELT_ORDER_BIG) + element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element)); + + /* If the value is in the correct position, we can avoid doing the VSPLT<x> + instruction. */ + value = INTVAL (element); + if (value != 1) + { + if (TARGET_VEXTRACTUB) + { + rtx di_tmp = gen_rtx_REG (DImode, REGNO (vec_tmp)); + emit_insn (gen_vsx_extract_v4si_di (di_tmp,src, element)); + } + else + emit_insn (gen_altivec_vspltw_direct (vec_tmp, src, element)); + } + else + vec_tmp = src; + + if (MEM_P (operands[0])) + { + if (can_create_pseudo_p ()) + dest = rs6000_address_for_fpconvert (dest); + + if (TARGET_VSX_SMALL_INTEGER) + emit_move_insn (dest, gen_rtx_REG (SImode, REGNO (vec_tmp))); + else + emit_insn (gen_stfiwx (dest, gen_rtx_REG (DImode, REGNO (vec_tmp)))); + } + + else if (TARGET_VSX_SMALL_INTEGER) + emit_move_insn (dest, gen_rtx_REG (SImode, REGNO (vec_tmp))); + else + emit_move_insn (gen_rtx_REG (DImode, REGNO (dest)), + gen_rtx_REG (DImode, REGNO (vec_tmp))); + + DONE; +} + [(set_attr "type" "mftgpr,fpstore,fpstore,vecsimple") + (set_attr "length" "8")]) + (define_insn_and_split "*vsx_extract_<mode>_p8" [(set (match_operand:<VS_scalar> 0 "nonimmediate_operand" "=r") (vec_select:<VS_scalar> - (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v") + (match_operand:VSX_EXTRACT_I2 1 "gpc_reg_operand" "v") (parallel [(match_operand:QI 2 "<VSX_EXTRACT_PREDICATE>" "n")]))) - (clobber (match_scratch:VSX_EXTRACT_I 3 "=v"))] + (clobber (match_scratch:VSX_EXTRACT_I2 3 "=v"))] "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT" "#" "&& reload_completed" @@ -2587,13 +2648,6 @@ (define_insn_and_split "*vsx_extract_<m else vec_tmp = src; } - else if (<MODE>mode == V4SImode) - { - if (value != 1) - emit_insn (gen_altivec_vspltw_direct (vec_tmp, src, element)); - else - vec_tmp = src; - } else gcc_unreachable (); Index: gcc/config/rs6000/rs6000.h =================================================================== --- gcc/config/rs6000/rs6000.h (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 241274) +++ gcc/config/rs6000/rs6000.h (.../gcc/config/rs6000) (working copy) @@ -1602,6 +1602,10 @@ enum r6000_reg_class_enum { RS6000_CONSTRAINT_wx, /* FPR register for STFIWX */ RS6000_CONSTRAINT_wy, /* VSX register for SF */ RS6000_CONSTRAINT_wz, /* FPR register for LFIWZX */ + RS6000_CONSTRAINT_wH, /* Altivec register for 32-bit integers. */ + RS6000_CONSTRAINT_wI, /* VSX register for 32-bit integers. */ + RS6000_CONSTRAINT_wJ, /* VSX register for 8/16-bit integers. */ + RS6000_CONSTRAINT_wK, /* Altivec register for 16/32-bit integers. */ RS6000_CONSTRAINT_MAX }; Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 241274) +++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy) @@ -458,7 +458,7 @@ (define_mode_attr f32_sm [(SF "m") (define_mode_attr f32_sm2 [(SF "wY") (SD "wn")]) (define_mode_attr f32_si [(SF "stfs%U0%X0 %1,%0") (SD "stfiwx %1,%y0")]) (define_mode_attr f32_si2 [(SF "stxssp %1,%0") (SD "stfiwx %1,%y0")]) -(define_mode_attr f32_sv [(SF "stxsspx %x1,%y0") (SD "stxsiwzx %x1,%y0")]) +(define_mode_attr f32_sv [(SF "stxsspx %x1,%y0") (SD "stxsiwx %x1,%y0")]) ; Definitions for 32-bit fpr direct move ; At present, the decimal modes are not allowed in the traditional altivec @@ -837,16 +837,18 @@ (define_insn_and_split "*zero_extendhi<m (define_insn "zero_extendsi<mode>2" - [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,??wj,!wz,!wu") - (zero_extend:EXTSI (match_operand:SI 1 "reg_or_mem_operand" "m,r,r,Z,Z")))] + [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,wz,wu,wj,r,wJwK") + (zero_extend:EXTSI (match_operand:SI 1 "reg_or_mem_operand" "m,r,Z,Z,r,wIwH,wJwK")))] "" "@ lwz%U1%X1 %0,%1 rldicl %0,%1,0,32 - mtvsrwz %x0,%1 lfiwzx %0,%y1 - lxsiwzx %x0,%y1" - [(set_attr "type" "load,shift,mffgpr,fpload,fpload")]) + lxsiwzx %x0,%y1 + mtvsrwz %x0,%1 + mfvsrwz %0,%x1 + xxextractuw %x0,%x1,1" + [(set_attr "type" "load,shift,fpload,fpload,mffgpr,mftgpr,vecexts")]) (define_insn_and_split "*zero_extendsi<mode>2_dot" [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y") @@ -1005,16 +1007,17 @@ (define_insn_and_split "*extendhi<mode>2 (define_insn "extendsi<mode>2" - [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,??wj,!wl,!wu") - (sign_extend:EXTSI (match_operand:SI 1 "lwa_operand" "Y,r,r,Z,Z")))] + [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,wl,wu,wj,wK") + (sign_extend:EXTSI (match_operand:SI 1 "lwa_operand" "Y,r,Z,Z,r,wK")))] "" "@ lwa%U1%X1 %0,%1 extsw %0,%1 - mtvsrwa %x0,%1 lfiwax %0,%y1 - lxsiwax %x0,%y1" - [(set_attr "type" "load,exts,mffgpr,fpload,fpload") + lxsiwax %x0,%y1 + mtvsrwa %x0,%1 + vextsw2d %0,%1" + [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts") (set_attr "sign_extend" "yes")]) (define_insn_and_split "*extendsi<mode>2_dot" @@ -4947,15 +4950,16 @@ (define_insn "*xxsel<mode>" ; We don't define lfiwax/lfiwzx with the normal definition, because we ; don't want to support putting SImode in FPR registers. (define_insn "lfiwax" - [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wj,!wj") - (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r")] + [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wj,wj,wK") + (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,wK")] UNSPEC_LFIWAX))] "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT && TARGET_LFIWAX" "@ lfiwax %0,%y1 lxsiwax %x0,%y1 - mtvsrwa %x0,%1" - [(set_attr "type" "fpload,fpload,mffgpr")]) + mtvsrwa %x0,%1 + vextsw2d %0,%1" + [(set_attr "type" "fpload,fpload,mffgpr,vecexts")]) ; This split must be run before register allocation because it allocates the ; memory slot that is needed to move values to/from the FPR. We don't allocate @@ -5019,7 +5023,10 @@ (define_insn_and_split "floatsi<mode>2_l operands[1] = rs6000_address_for_fpconvert (operands[1]); if (GET_CODE (operands[2]) == SCRATCH) operands[2] = gen_reg_rtx (DImode); - emit_insn (gen_lfiwax (operands[2], operands[1])); + if (TARGET_VSX_SMALL_INTEGER) + emit_insn (gen_extendsidi2 (operands[2], operands[1])); + else + emit_insn (gen_lfiwax (operands[2], operands[1])); emit_insn (gen_floatdi<mode>2 (operands[0], operands[2])); DONE; }" @@ -5027,15 +5034,16 @@ (define_insn_and_split "floatsi<mode>2_l (set_attr "type" "fpload")]) (define_insn "lfiwzx" - [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wj,!wj") - (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r")] + [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wj,wj,wJwK") + (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,wJwK")] UNSPEC_LFIWZX))] "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT && TARGET_LFIWZX" "@ lfiwzx %0,%y1 lxsiwzx %x0,%y1 - mtvsrwz %x0,%1" - [(set_attr "type" "fpload,fpload,mftgpr")]) + mtvsrwz %x0,%1 + xxextractuw %x0,%x1,1" + [(set_attr "type" "fpload,fpload,mftgpr,vecexts")]) (define_insn_and_split "floatunssi<mode>2_lfiwzx" [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Fv>") @@ -5094,7 +5102,10 @@ (define_insn_and_split "floatunssi<mode> operands[1] = rs6000_address_for_fpconvert (operands[1]); if (GET_CODE (operands[2]) == SCRATCH) operands[2] = gen_reg_rtx (DImode); - emit_insn (gen_lfiwzx (operands[2], operands[1])); + if (TARGET_VSX_SMALL_INTEGER) + emit_insn (gen_zero_extendsidi2 (operands[2], operands[1])); + else + emit_insn (gen_lfiwzx (operands[2], operands[1])); emit_insn (gen_floatdi<mode>2 (operands[0], operands[2])); DONE; }" @@ -6518,25 +6529,66 @@ (define_insn "movsi_low" [(set_attr "type" "load") (set_attr "length" "4")]) -(define_insn "*movsi_internal1" - [(set (match_operand:SI 0 "rs6000_nonimmediate_operand" "=r,r,r,m,r,r,r,r,*c*l,*h,*h") - (match_operand:SI 1 "input_operand" "r,U,m,r,I,L,n,*h,r,r,0"))] +;; MR LA LWZ LFIWZX LXSIWZX +;; STW STFIWX STXSIWX LI LIS +;; # XXLOR XXSPLTIB 0 XXSPLTIB -1 VSPLTISW +;; XXLXOR 0 XXLORC -1 P9 const MTVSRWZ MFVSRWZ +;; MF%1 MT%0 MT%0 NOP +(define_insn "*movsi_internal1" + [(set (match_operand:SI 0 "rs6000_nonimmediate_operand" + "=r, r, r, ?*wI, ?*wH, + m, ?Z, ?Z, r, r, + r, ?*wIwH, ?*wJwK, ?*wK, ?*wJwK, + ?*wJwK, ?*wH, ?*wK, ?*wIwH, ?r, + r, *c*l, *h, *h") + + (match_operand:SI 1 "input_operand" + "r, U, m, Z, Z, + r, wI, wH, I, L, + n, wIwH, O, wM, wB, + O, wM, wS, r, wIwH, + *h, r, r, 0"))] + "!TARGET_SINGLE_FPU && (gpc_reg_operand (operands[0], SImode) || gpc_reg_operand (operands[1], SImode))" "@ mr %0,%1 la %0,%a1 lwz%U1%X1 %0,%1 + lfiwzx %0,%y1 + lxsiwzx %x0,%y1 stw%U0%X0 %1,%0 + stfiwx %1,%y0 + stxsiwx %x1,%y0 li %0,%1 lis %0,%v1 # + xxlor %x0,%x1,%x1 + xxspltib %x0,0 + xxspltib %x0,255 + vspltisw %0,%1 + xxlxor %x0,%x0,%x0 + xxlorc %x0,%x0,%x0 + # + mtvsrwz %x0,%1 + mfvsrwz %0,%x1 mf%1 %0 mt%0 %1 mt%0 %1 nop" - [(set_attr "type" "*,*,load,store,*,*,*,mfjmpr,mtjmpr,*,*") - (set_attr "length" "4,4,4,4,4,4,8,4,4,4,4")]) + [(set_attr "type" + "*, *, load, fpload, fpload, + store, fpstore, fpstore, *, *, + *, veclogical, vecsimple, vecsimple, vecsimple, + veclogical, veclogical, vecsimple, mffgpr, mftgpr, + *, *, *, *") + + (set_attr "length" + "4, 4, 4, 4, 4, + 4, 4, 4, 4, 4, + 8, 4, 4, 4, 4, + 4, 4, 8, 4, 4, + 4, 4, 4, 4")]) (define_insn "*movsi_internal1_single" [(set (match_operand:SI 0 "rs6000_nonimmediate_operand" "=r,r,r,m,r,r,r,r,*c*l,*h,*h,m,*f") @@ -6581,6 +6633,23 @@ (define_split FAIL; }") +;; Split loading -128..127 to use XXSPLITB and VEXTSW2D +(define_split + [(set (match_operand:DI 0 "altivec_register_operand" "") + (match_operand:DI 1 "xxspltib_constant_split" ""))] + "TARGET_VSX_SMALL_INTEGER && TARGET_P9_VECTOR && reload_completed" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + int r = REGNO (op0); + rtx op0_v16qi = gen_rtx_REG (V16QImode, r); + + emit_insn (gen_xxspltib_v16qi (op0_v16qi, op1)); + emit_insn (gen_vsx_sign_extend_qi_si (operands[0], op0_v16qi)); + DONE; +}) + (define_insn "*mov<mode>_internal2" [(set (match_operand:CC 2 "cc_reg_operand" "=y,x,?y") (compare:CC (match_operand:P 1 "gpc_reg_operand" "0,r,r") Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/doc) (revision 241274) +++ gcc/doc/md.texi (.../gcc/doc) (working copy) @@ -3125,6 +3125,18 @@ Memory operand suitable for power9 fusio @item wG Memory operand suitable for TOC fusion memory references. +@item wH +Altivec register if @option{-mvsx-small-integer}. + +@item wI +Floating point register if @option{-mvsx-small-integer}. + +@item wJ +FP register if @option{-mvsx-small-integer} and @option{-mpower9-vector}. + +@item wK +Altivec register if @option{-mvsx-small-integer} and @option{-mpower9-vector}. + @item wL Int constant that is the element number that the MFVSRLD instruction. targets. Index: gcc/testsuite/gcc.target/powerpc/vsx-simode.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-simode.c (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-simode.c (.../gcc/testsuite/gcc.target/powerpc) (revision 241600) @@ -0,0 +1,22 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O2 -mvsx-small-integer" } */ + +double load_asm_d_constraint (int *p) +{ + double ret; + __asm__ ("xxlor %x0,%x1,%x1\t# load d constraint" : "=d" (ret) : "d" (*p)); + return ret; +} + +void store_asm_d_constraint (int *p, double x) +{ + int i; + __asm__ ("xxlor %x0,%x1,%x1\t# store d constraint" : "=d" (i) : "d" (x)); + *p = i; +} + +/* { dg-final { scan-assembler "lfiwzx" } } */ +/* { dg-final { scan-assembler "stfiwx" } } */ Index: gcc/testsuite/gcc.target/powerpc/vsx-simode2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-simode2.c (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-simode2.c (.../gcc/testsuite/gcc.target/powerpc) (revision 241600) @@ -0,0 +1,15 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O2 -mvsx-small-integer" } */ + +unsigned int foo (unsigned int u) +{ + unsigned int ret; + __asm__ ("xxlor %x0,%x1,%x1\t# v, v constraints" : "=v" (ret) : "v" (u)); + return ret; +} + +/* { dg-final { scan-assembler "mtvsrwz" } } */ +/* { dg-final { scan-assembler "mfvsrwz" } } */ Index: gcc/testsuite/gcc.target/powerpc/vsx-simode3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-simode3.c (svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-simode3.c (.../gcc/testsuite/gcc.target/powerpc) (revision 241600) @@ -0,0 +1,22 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O2 -mvsx-small-integer" } */ + +double load_asm_v_constraint (int *p) +{ + double ret; + __asm__ ("xxlor %x0,%x1,%x1\t# load v constraint" : "=d" (ret) : "v" (*p)); + return ret; +} + +void store_asm_v_constraint (int *p, double x) +{ + int i; + __asm__ ("xxlor %x0,%x1,%x1\t# store v constraint" : "=v" (i) : "d" (x)); + *p = i; +} + +/* { dg-final { scan-assembler "lxsiwzx" } } */ +/* { dg-final { scan-assembler "stxsiwx" } } */