On Mon, Feb 05, 2024 at 11:58:31AM +0800, Kewen.Lin wrote: > Hi Mike, I will comment on about 1/2 of the things, and come back with the other comments.
> on 2024/1/6 07:42, Michael Meissner wrote: > > This patch is a prelimianry patch to add the full 1,024 bit dense math > > register> (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto > > the top of the > > DMR register. > > > > This patch only adds the new 1,024 bit register support. It does not add > > support for any instructions that need 1,024 bit registers instead of 512 > > bit > > registers. > > > > I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit > > typo: 1,204 Thanks. > > +(define_insn_and_split "*movtdo" > > + [(set (match_operand:TDO 0 "nonimmediate_operand" "=wa,m,wa,wD,wD,wa") > > + (match_operand:TDO 1 "input_operand" "m,wa,wa,wa,wD,wD"))] > > + "TARGET_DENSE_MATH > > + && (gpc_reg_operand (operands[0], TDOmode) > > + || gpc_reg_operand (operands[1], TDOmode))" > > + "@ > > + # > > + # > > + # > > + # > > + dmmr %0,%1 > > + #" > > + "&& reload_completed > > + && (!dmr_operand (operands[0], TDOmode) || !dmr_operand (operands[1], > > TDOmode))" > > + [(const_int 0)] > > +{ > > + rtx op0 = operands[0]; > > + rtx op1 = operands[1]; > > + > > + if (REG_P (op0) && REG_P (op1)) > > + { > > + int regno0 = REGNO (op0); > > + int regno1 = REGNO (op1); > > + > > + if (DMR_REGNO_P (regno0) && VSX_REGNO_P (regno1)) > > + { > > + rtx op1_upper = gen_rtx_REG (XOmode, regno1); > > + rtx op1_lower = gen_rtx_REG (XOmode, regno1 + 4); > > + emit_insn (gen_movtdo_insert512_upper (op0, op1_upper)); > > + emit_insn (gen_movtdo_insert512_lower (op0, op0, op1_lower)); > > + DONE; > > + } > > + > > + else if (VSX_REGNO_P (regno0) && DMR_REGNO_P (regno1)) > > + { > > + rtx op0_upper = gen_rtx_REG (XOmode, regno0); > > + rtx op0_lower = gen_rtx_REG (XOmode, regno0 + 4); > > + emit_insn (gen_movtdo_extract512 (op0_upper, op1, const0_rtx)); > > + emit_insn (gen_movtdo_extract512 (op0_lower, op1, const1_rtx)); > > + DONE; > > + } > > Add an assertion like gcc_assert (VSX_REGNO_P (regno1) && VSX_REGNO_P > (regno2))? Ok. > > + > > +;; Reload DMR registers from memory > > +(define_insn_and_split "reload_dmr_from_memory" > > + [(set (match_operand:TDO 0 "dmr_operand" "=wD") > > + (unspec:TDO [(match_operand:TDO 1 "memory_operand" "m")] > > + UNSPEC_DMR_RELOAD_FROM_MEMORY)) > > + (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] > > + "TARGET_DENSE_MATH" > > + "#" > > + "&& reload_completed" > > + [(const_int 0)] > > +{ > > + rtx dest = operands[0]; > > + rtx src = operands[1]; > > + rtx tmp = operands[2]; > > + rtx mem_upper = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 0 : 32); > > + rtx mem_lower = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 32 : 0); > > I think the offset should be 64 rather than 32. Good catch, thanks. > > + > > + emit_move_insn (tmp, mem_upper); > > + emit_insn (gen_movtdo_insert512_upper (dest, tmp)); > > + > > + emit_move_insn (tmp, mem_lower); > > + emit_insn (gen_movtdo_insert512_lower (dest, dest, tmp)); > > + DONE; > > +} > > + [(set_attr "length" "16") > > + (set_attr "max_prefixed_insns" "2") > > + (set_attr "type" "vecload")]) > > + > > +;; Reload dense math registers to memory > > +(define_insn_and_split "reload_dmr_to_memory" > > + [(set (match_operand:TDO 0 "memory_operand" "=m") > > + (unspec:TDO [(match_operand:TDO 1 "dmr_operand" "wD")] > > + UNSPEC_DMR_RELOAD_TO_MEMORY)) > > + (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] > > + "TARGET_DENSE_MATH" > > + "#" > > + "&& reload_completed" > > + [(const_int 0)] > > +{ > > + rtx dest = operands[0]; > > + rtx src = operands[1]; > > + rtx tmp = operands[2]; > > + rtx mem_upper = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 0 : 32); > > + rtx mem_lower = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 32 : 0); > > Ditto. Yep. > > diff --git a/gcc/config/rs6000/rs6000-builtin.cc > > b/gcc/config/rs6000/rs6000-builtin.cc > > index 6698274031b..54868d2009c 100644 > > --- a/gcc/config/rs6000/rs6000-builtin.cc > > +++ b/gcc/config/rs6000/rs6000-builtin.cc > > @@ -495,6 +495,8 @@ const char *rs6000_type_string (tree type_node) > > return "__vector_pair"; > > else if (type_node == vector_quad_type_node) > > return "__vector_quad"; > > + else if (type_node == dmr_type_node) > > + return "__dmr"; > > > > return "unknown"; > > } > > @@ -781,6 +783,17 @@ rs6000_init_builtins (void) > > t = build_qualified_type (vector_quad_type_node, TYPE_QUAL_CONST); > > ptr_vector_quad_type_node = build_pointer_type (t); > > > > + dmr_type_node = make_node (OPAQUE_TYPE); > > + SET_TYPE_MODE (dmr_type_node, TDOmode); > > + TYPE_SIZE (dmr_type_node) = bitsize_int (GET_MODE_BITSIZE (TDOmode)); > > + TYPE_PRECISION (dmr_type_node) = GET_MODE_BITSIZE (TDOmode); > > + TYPE_SIZE_UNIT (dmr_type_node) = size_int (GET_MODE_SIZE (TDOmode)); > > + SET_TYPE_ALIGN (dmr_type_node, 512); > > why not 1024? Since we don't have a 1,024 bit load/store and have to use multiple vector pair or vector load/stores, there is no reason to ask for a 1,024 alignment. In addition, I would worry that having a larger alignment might be an issue with the stack, since I don't believe we have support for aligning the stack to 1,024 bit boundaries. > > --- a/gcc/config/rs6000/rs6000-call.cc > > +++ b/gcc/config/rs6000/rs6000-call.cc > > @@ -437,7 +437,8 @@ rs6000_return_in_memory (const_tree type, const_tree > > fntype ATTRIBUTE_UNUSED) > > if (cfun > > && !cfun->machine->mma_return_type_error > > && TREE_TYPE (cfun->decl) == fntype > > - && (TYPE_MODE (type) == OOmode || TYPE_MODE (type) == XOmode)) > > + && (TYPE_MODE (type) == OOmode || TYPE_MODE (type) == XOmode > > + || TYPE_MODE (type) == TDOmode)) > > May be just with OPAQUE_MODE_P (TYPE_MODE (type)) for all the cases on type > mode. Basically I forgot about using OPAQUE_MODE in this case. Using OPAQUE_MODE is better. > So far only rs6000 defines OPAQUE_MODE, if we are worried that there are some > generic opaque modes > some day, we can probably add one assertion somewhere to guaratee it. Or add > one macro like > OPAQUE_MMA_MODE_P to ensure it only matches {OO,XO,TDO}mode. > > > { > > /* Record we have now handled function CFUN, so the next time we > > are called, we do not re-report the same error. */ > > @@ -1641,6 +1642,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const > > function_arg_info &arg) > > return NULL_RTX; > > } > > > > + if (mode == TDOmode) > > + { > > + if (TYPE_CANONICAL (type) != NULL_TREE) > > + type = TYPE_CANONICAL (type); > > + error ("invalid use of dense math operand of type %qs as a function " > > + "parameter", > > + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type)))); > > + return NULL_RTX; > > + } > > Can we merge this hunk into the above hunk for OOmode and XOmode? Then the > code with TYPE_CANONICAL > can be shared and better to maintain. IMHO, this dense math operand is also > MMA operand so the above > error message still works, if it's desired to note this dense math operand > then we can use > (mode == TDOmode)? "dense math": "MMA" for the different string part. I will need to look into this later. > > + > > /* Return a marker to indicate whether CR1 needs to set or clear the > > bit that V.4 uses to say fp args were passed in registers. > > Assume that we don't need the marker for software floating point, > > diff --git a/gcc/config/rs6000/rs6000-modes.def > > b/gcc/config/rs6000/rs6000-modes.def > > index 094b246c834..60ebb363196 100644 > > --- a/gcc/config/rs6000/rs6000-modes.def > > +++ b/gcc/config/rs6000/rs6000-modes.def > > @@ -86,3 +86,7 @@ PARTIAL_INT_MODE (TI, 128, PTI); > > /* Modes used by __vector_pair and __vector_quad. */ > > OPAQUE_MODE (OO, 32); > > OPAQUE_MODE (XO, 64); > > + > > +/* Modes used by __dmr. */ > > Nit: s/Modes/Mode/ > > > +OPAQUE_MODE (TDO, 128); > > + > > I assumed that "TD" stands for something but I have no idea (at least not > obvious to me), > could we also put some comments for it? Basically Segher and I went back and forth on the names. I would have to dig into my notes what TDO stands for. > > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > > index 59517c8608d..aed4b72c4ea 100644 > > --- a/gcc/config/rs6000/rs6000.cc > > +++ b/gcc/config/rs6000/rs6000.cc > > @@ -1846,7 +1846,9 @@ rs6000_hard_regno_nregs_internal (int regno, > > machine_mode mode) > > 128-bit floating point that can go in vector registers, which has VSX > > memory addressing. */ > > if (FP_REGNO_P (regno)) > > - reg_size = (VECTOR_MEM_VSX_P (mode) || VECTOR_ALIGNMENT_P (mode) > > + reg_size = (VECTOR_MEM_VSX_P (mode) > > + || VECTOR_ALIGNMENT_P (mode) > > + || mode == TDOmode > > Redundant change, since VECTOR_ALIGNMENT_P considers TDOmode as this patch > changes. Ok. And I'll get back to the rest of the comments shortly. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com