Re: [PATCH V2, 3/3] Add support for 1,024 bit DMF registers on a future PowerPC.

Avinash Jayakar Mon, 17 Nov 2025 03:05:38 -0800

Hi Michael,

I am working on adding builtins for the sha insns in powerpc. I just
had 2 questions regarding the new __dmf type/keyword:


1. The current builtin mma functions for e.g.,
__builtin_mma_xvbf16ger2pp, use __vector_quad* as a parameter. Would we
require to change signatures of these builtin be changed to the new
type __dmf, in case both dense math and mma is enabled?

2. Would it be possible to add a __dmf_pair (2048 bits) type needed for
some of the sha instructions (dmsha3hash)? If not, I can follow a
similar implementation done in this for __dmf, but I am not sure if the
name should be __dmf_pair or __dmf2048.

Also just one small comment on the patch below. 

Thanks and regards,
Avinash Jayakar

On Fri, 2025-11-14 at 02:57 -0500, Michael Meissner wrote:
> This patch is a prelimianry patch to add the full 1,024 bit dense
> math register
> (DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the
> top of the
> DMR register.
> 
> This patch only adds the new 1,024 bit register support.  It does not
> add
> support for any instructions that need 1,024 bit registers instead of
> 512 bit
> registers.
> 
> I used the new mode 'TDOmode' to be the opaque mode used for 1,024
> bit
> registers.  The 'wD' constraint added in previous patches is used for
> these
> registers.  I added support to do load and store of DMRs via the VSX
> registers,
> since there are no load/store dense math instructions.  I added the
> new keyword
> '__dmf' to create 1,024 bit types that can be loaded into DMRs.  At
> present, I
> don't have aliases for __dmf512 and __dmf1024 that we've discussed
> internally.
> 
> I have built bootstrap GCC compilers on little endian and big endian
> PowerPC servers, and there were no regressions.  Can I commit this
> patch to GCC 16 once the following patches have been applied?
> 
>   *
> https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700539.html
>   *
> https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700540.html
>   *
> https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700542.html
>   *
> https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700543.html
> 
> 2025-11-14   Michael Meissner  <[email protected]>
> 
> gcc/
> 
>       * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New
> unspec.
>       (UNSPEC_DM_INSERT512_LOWER): Likewise.
>       (UNSPEC_DM_EXTRACT512): Likewise.
>       (UNSPEC_DMF_RELOAD_FROM_MEMORY): Likewise.
>       (UNSPEC_DMF_RELOAD_TO_MEMORY): Likewise.
>       (movtdo): New define_expand and define_insn_and_split to
> implement 1,024
>       bit DMR registers.
>       (movtdo_insert512_upper): New insn.
>       (movtdo_insert512_lower): Likewise.
>       (movtdo_extract512): Likewise.
>       (reload_dmf_from_memory): Likewise.
>       (reload_dmf_to_memory): Likewise.
>       * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add
> DMF
>       support.
>       (rs6000_init_builtins): Add support for __dmf keyword.
>       * config/rs6000/rs6000-call.cc (rs6000_return_in_memory):
> Add support
>       for TDOmode.
>       (rs6000_function_arg): Likewise.
>       * config/rs6000/rs6000-modes.def (TDOmode): New mode.
>       * config/rs6000/rs6000.cc
> (rs6000_hard_regno_nregs_internal): Add
>       support for TDOmode.
>       (rs6000_hard_regno_mode_ok_uncached): Likewise.
>       (rs6000_hard_regno_mode_ok): Likewise.
>       (rs6000_modes_tieable_p): Likewise.
>       (rs6000_debug_reg_global): Likewise.
>       (rs6000_setup_reg_addr_masks): Likewise.
>       (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. 
> Setup reload
>       hooks for DMF mode.
>       (reg_offset_addressing_ok_p): Add support for TDOmode.
>       (rs6000_emit_move): Likewise.
>       (rs6000_secondary_reload_simple_move): Likewise.
>       (rs6000_preferred_reload_class): Likewise.
>       (rs6000_secondary_reload_class): Likewise.
>       (rs6000_mangle_type): Add mangling for __dmf type.
>       (rs6000_dmf_register_move_cost): Add support for TDOmode.
>       (rs6000_split_multireg_move): Likewise.
>       (rs6000_invalid_conversion): Likewise.
>       * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
>       (enum rs6000_builtin_type_index): Add DMF type nodes.
>       (dmf_type_node): Likewise.
>       (ptr_dmf_type_node): Likewise.
> 
> gcc/testsuite/
> 
>       * gcc.target/powerpc/dm-1024bit.c: New test.
>       * lib/target-supports.exp
> (check_effective_target_ppc_dmf_ok): New
>       target test.
> ---
>  gcc/config/rs6000/mma.md                      | 154
> ++++++++++++++++++
>  gcc/config/rs6000/rs6000-builtin.cc           |  17 ++
>  gcc/config/rs6000/rs6000-call.cc              |  10 +-
>  gcc/config/rs6000/rs6000-modes.def            |   4 +
>  gcc/config/rs6000/rs6000.cc                   | 101 ++++++++----
>  gcc/config/rs6000/rs6000.h                    |   6 +-
>  gcc/testsuite/gcc.target/powerpc/dm-1024bit.c |  63 +++++++
>  gcc/testsuite/lib/target-supports.exp         |  35 ++++
>  8 files changed, 356 insertions(+), 34 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 3f5852ca2bb..d7df2a1a71a 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -91,6 +91,11 @@ (define_c_enum "unspec"
>     UNSPEC_MMA_XXMFACC
>     UNSPEC_MMA_XXMTACC
>     UNSPEC_MMA_DMSETDMRZ
> +   UNSPEC_DM_INSERT512_UPPER
> +   UNSPEC_DM_INSERT512_LOWER
> +   UNSPEC_DM_EXTRACT512
> +   UNSPEC_DMF_RELOAD_FROM_MEMORY
> +   UNSPEC_DMF_RELOAD_TO_MEMORY
>    ])
>  
>  (define_c_enum "unspecv"
> @@ -699,3 +704,152 @@ (define_insn "mma_<avvi4i4i4>"
>    "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6"
>    [(set_attr "type" "mma")
>     (set_attr "prefixed" "yes")])
> +
> +;; TDOmode (__dmf keyword for 1,024 bit registers).
> +(define_expand "movtdo"
> +  [(set (match_operand:TDO 0 "nonimmediate_operand")
> +     (match_operand:TDO 1 "input_operand"))]
> +  "TARGET_DENSE_MATH"
> +{
> +  rs6000_emit_move (operands[0], operands[1], TDOmode);
> +  DONE;
> +})
> +
> +(define_insn_and_split "*movtdo"
> +  [(set (match_operand:TDO 0 "nonimmediate_operand"
> "=wa,m,wa,wD,wD,wa")
> +     (match_operand:TDO 1 "input_operand" "m,wa,wa,wa,wD,wD"))]
> +  "TARGET_DENSE_MATH
> +   && (gpc_reg_operand (operands[0], TDOmode)
> +       || gpc_reg_operand (operands[1], TDOmode))"
> +  "@
> +   #
> +   #
> +   #
> +   #
> +   dmmr %0,%1
> +   #"
> +  "&& reload_completed
> +   && (!dmf_operand (operands[0], TDOmode) || !dmf_operand
> (operands[1], TDOmode))"
> +  [(const_int 0)]
> +{
> +  rtx op0 = operands[0];
> +  rtx op1 = operands[1];
> +
> +  if (REG_P (op0) && REG_P (op1))
> +    {
> +      int regno0 = REGNO (op0);
> +      int regno1 = REGNO (op1);
> +
> +      if (DMF_REGNO_P (regno0) && VSX_REGNO_P (regno1))
> +     {
> +       rtx op1_upper = gen_rtx_REG (XOmode, regno1);
> +       rtx op1_lower = gen_rtx_REG (XOmode, regno1 + 4);
> +       emit_insn (gen_movtdo_insert512_upper (op0, op1_upper));
> +       emit_insn (gen_movtdo_insert512_lower (op0, op0,
> op1_lower));
> +       DONE;
> +     }
> +
> +      else if (VSX_REGNO_P (regno0) && DMF_REGNO_P (regno1))
> +     {
> +       rtx op0_upper = gen_rtx_REG (XOmode, regno0);
> +       rtx op0_lower = gen_rtx_REG (XOmode, regno0 + 4);
> +       emit_insn (gen_movtdo_extract512 (op0_upper, op1,
> const0_rtx));
> +       emit_insn (gen_movtdo_extract512 (op0_lower, op1,
> const1_rtx));
> +       DONE;
> +     }
> +
> +     else
> +     gcc_assert (VSX_REGNO_P (regno0) && VSX_REGNO_P (regno1));
> +    }
> +
> +  rs6000_split_multireg_move (operands[0], operands[1]);
> +  DONE;
> +}
> +  [(set_attr "type"
> "vecload,vecstore,vecmove,vecmove,vecmove,vecmove")
> +   (set_attr "length" "*,*,32,8,*,8")
> +   (set_attr "max_prefixed_insns" "4,4,*,*,*,*")])
> +
> +;; Move from VSX registers to DMF registers via two insert 512 bit
> +;; instructions.
> +(define_insn "movtdo_insert512_upper"
> +  [(set (match_operand:TDO 0 "dmf_operand" "=wD")
> +     (unspec:TDO [(match_operand:XO 1 "vsx_register_operand"
> "wa")]
> +                 UNSPEC_DM_INSERT512_UPPER))]
> +  "TARGET_DENSE_MATH"
> +  "dmxxinstdmr512 %0,%1,%Y1,0"
> +  [(set_attr "type" "mma")])
> +
> +(define_insn "movtdo_insert512_lower"
> +  [(set (match_operand:TDO 0 "dmf_operand" "=wD")
> +     (unspec:TDO [(match_operand:TDO 1 "dmf_operand" "0")
> +                  (match_operand:XO 2 "vsx_register_operand"
> "wa")]
> +                 UNSPEC_DM_INSERT512_LOWER))]
> +  "TARGET_DENSE_MATH"
> +  "dmxxinstdmr512 %0,%2,%Y2,1"
> +  [(set_attr "type" "mma")])
> +
> +;; Move from DMF registers to VSX registers via two extract 512 bit
> +;; instructions.
> +(define_insn "movtdo_extract512"
> +  [(set (match_operand:XO 0 "vsx_register_operand" "=wa")
> +     (unspec:XO [(match_operand:TDO 1 "dmf_operand" "wD")
> +                 (match_operand 2 "const_0_to_1_operand" "n")]
> +                UNSPEC_DM_EXTRACT512))]
> +  "TARGET_DENSE_MATH"
> +  "dmxxextfdmr512 %0,%Y0,%1,%2"
> +  [(set_attr "type" "mma")])
> +
> +;; Reload DMF registers from memory
> +(define_insn_and_split "reload_dmf_from_memory"
> +  [(set (match_operand:TDO 0 "dmf_operand" "=wD")
> +     (unspec:TDO [(match_operand:TDO 1 "memory_operand" "m")]
> +                 UNSPEC_DMF_RELOAD_FROM_MEMORY))
> +   (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))]
> +  "TARGET_DENSE_MATH"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  rtx tmp = operands[2];
> +  rtx mem_upper = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 0
> : 64);
> +  rtx mem_lower = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 64
> : 0);
> +
> +  emit_move_insn (tmp, mem_upper);
> +  emit_insn (gen_movtdo_insert512_upper (dest, tmp));
> +
> +  emit_move_insn (tmp, mem_lower);
> +  emit_insn (gen_movtdo_insert512_lower (dest, dest, tmp));
> +  DONE;
> +}
> +  [(set_attr "length" "16")
> +   (set_attr "max_prefixed_insns" "2")
> +   (set_attr "type" "vecload")])
> +
> +;; Reload dense math registers to memory
> +(define_insn_and_split "reload_dmf_to_memory"
> +  [(set (match_operand:TDO 0 "memory_operand" "=m")
> +     (unspec:TDO [(match_operand:TDO 1 "dmf_operand" "wD")]
> +                 UNSPEC_DMF_RELOAD_TO_MEMORY))
> +   (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))]
> +  "TARGET_DENSE_MATH"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  rtx tmp = operands[2];
> +  rtx mem_upper = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 0
> : 64);
> +  rtx mem_lower = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ?
> 64 : 0);
> +
> +  emit_insn (gen_movtdo_extract512 (tmp, src, const0_rtx));
> +  emit_move_insn (mem_upper, tmp);
> +
> +  emit_insn (gen_movtdo_extract512 (tmp, src, const1_rtx));
> +  emit_move_insn (mem_lower, tmp);
> +  DONE;
> +}
> +  [(set_attr "length" "16")
> +   (set_attr "max_prefixed_insns" "2")])
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 6b7e5686f0c..a02e4cd03ef 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -495,6 +495,8 @@ const char *rs6000_type_string (tree type_node)
>      return "__vector_pair";
>    else if (type_node == vector_quad_type_node)
>      return "__vector_quad";
> +  else if (type_node == dmf_type_node)
> +    return "__dmf";
>  
>    return "unknown";
>  }
> @@ -781,6 +783,21 @@ rs6000_init_builtins (void)
>    t = build_qualified_type (vector_quad_type_node, TYPE_QUAL_CONST);
>    ptr_vector_quad_type_node = build_pointer_type (t);
>  
> +  /* For TDOmode (1,024 bit dense math accumulators), don't use an
> alignment of
> +     1,024, use 512.  TDOmode loads and stores are always broken up
> into 2
> +     vector pair loads or stores.  In addition, we don't have
> support for
> +     aligning the stack to 1,024 bits.  */
> +  dmf_type_node = make_node (OPAQUE_TYPE);
> +  SET_TYPE_MODE (dmf_type_node, TDOmode);
> +  TYPE_SIZE (dmf_type_node) = bitsize_int (GET_MODE_BITSIZE
> (TDOmode));
> +  TYPE_PRECISION (dmf_type_node) = GET_MODE_BITSIZE (TDOmode);
> +  TYPE_SIZE_UNIT (dmf_type_node) = size_int (GET_MODE_SIZE
> (TDOmode));
> +  SET_TYPE_ALIGN (dmf_type_node, 512);
> +  TYPE_USER_ALIGN (dmf_type_node) = 0;
> +  lang_hooks.types.register_builtin_type (dmf_type_node, "__dmf");
> +  t = build_qualified_type (dmf_type_node, TYPE_QUAL_CONST);
> +  ptr_dmf_type_node = build_pointer_type (t);
> +
>    tdecl = add_builtin_type ("__bool char", bool_char_type_node);
>    TYPE_NAME (bool_char_type_node) = tdecl;
>  
> diff --git a/gcc/config/rs6000/rs6000-call.cc
> b/gcc/config/rs6000/rs6000-call.cc
> index 8fe5652442e..7541050ffe7 100644
> --- a/gcc/config/rs6000/rs6000-call.cc
> +++ b/gcc/config/rs6000/rs6000-call.cc
> @@ -437,14 +437,15 @@ rs6000_return_in_memory (const_tree type,
> const_tree fntype ATTRIBUTE_UNUSED)
>    if (cfun
>        && !cfun->machine->mma_return_type_error
>        && TREE_TYPE (cfun->decl) == fntype
> -      && (TYPE_MODE (type) == OOmode || TYPE_MODE (type) == XOmode))
> +      && OPAQUE_MODE_P (TYPE_MODE (type)))
>      {
>        /* Record we have now handled function CFUN, so the next time
> we
>        are called, we do not re-report the same error.  */
>        cfun->machine->mma_return_type_error = true;
>        if (TYPE_CANONICAL (type) != NULL_TREE)
>       type = TYPE_CANONICAL (type);
> -      error ("invalid use of MMA type %qs as a function return
> value",
> +      error ("invalid use of %s type %qs as a function return
> value",
> +          (TYPE_MODE (type) == TDOmode) ? "dense math" : "MMA",
>            IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type))));
>      }
>  
> @@ -1632,11 +1633,12 @@ rs6000_function_arg (cumulative_args_t cum_v,
> const function_arg_info &arg)
>    int n_elts;
>  
>    /* We do not allow MMA types being used as function arguments.  */
> -  if (mode == OOmode || mode == XOmode)
> +  if (OPAQUE_MODE_P (mode))
>      {
>        if (TYPE_CANONICAL (type) != NULL_TREE)
>       type = TYPE_CANONICAL (type);
> -      error ("invalid use of MMA operand of type %qs as a function
> parameter",
> +      error ("invalid use of %s operand of type %qs as a function
> parameter",
> +          (mode == TDOmode) ? "dense math" : "MMA",
>            IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type))));
>        return NULL_RTX;
>      }
> diff --git a/gcc/config/rs6000/rs6000-modes.def
> b/gcc/config/rs6000/rs6000-modes.def
> index f89e4ef403c..9a8b505ab6a 100644
> --- a/gcc/config/rs6000/rs6000-modes.def
> +++ b/gcc/config/rs6000/rs6000-modes.def
> @@ -79,3 +79,7 @@ PARTIAL_INT_MODE (TI, 128, PTI);
>  /* Modes used by __vector_pair and __vector_quad.  */
>  OPAQUE_MODE (OO, 32);
>  OPAQUE_MODE (XO, 64);
> +
> +/* Mode used by __dmf.  */
> +OPAQUE_MODE (TDO, 128);
> +
> diff --git a/gcc/config/rs6000/rs6000.cc
> b/gcc/config/rs6000/rs6000.cc
In this file, should we also update rs6000_opaque_type_invalid_use_p
function to say __dmf requires "-mdense-math" flag?

> index 570e8a14f2d..635f05d0d02 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1842,7 +1842,8 @@ rs6000_hard_regno_nregs_internal (int regno,
> machine_mode mode)
>       128-bit floating point that can go in vector registers, which
> has VSX
>       memory addressing.  */
>    if (FP_REGNO_P (regno))
> -    reg_size = (VECTOR_MEM_VSX_P (mode) || VECTOR_ALIGNMENT_P (mode)
> +    reg_size = (VECTOR_MEM_VSX_P (mode)
> +             || VECTOR_ALIGNMENT_P (mode)
>               ? UNITS_PER_VSX_WORD
>               : UNITS_PER_FP_WORD);
>  
> @@ -1882,13 +1883,13 @@ rs6000_hard_regno_mode_ok_uncached (int
> regno, machine_mode mode)
>       Because we just use the VSX registers for load/store
> operations, we just
>       need to make sure load vector pair and store vector pair
> instructions can
>       be used.  */
> -  if (mode == XOmode)
> +  if (mode == XOmode || mode == TDOmode)
>      {
>        if (!TARGET_MMA)
>       return 0;
>  
>        else if (!TARGET_DENSE_MATH)
> -     return (FP_REGNO_P (regno) && (regno & 3) == 0);
> +     return (mode == XOmode && FP_REGNO_P (regno) && (regno & 3)
> == 0);
>  
>        else if (DMF_REGNO_P (regno))
>       return 1;
> @@ -1899,7 +1900,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno,
> machine_mode mode)
>               && (regno & 1) == 0);
>      }
>  
> -  /* No other types other than XOmode can go in DMFs.  */
> +  /* No other types other than XOmode can go in dense math
> registers.  */
>    if (DMF_REGNO_P (regno))
>      return 0;
>  
> @@ -2007,9 +2008,11 @@ rs6000_hard_regno_mode_ok (unsigned int regno,
> machine_mode mode)
>     GPR registers, and TImode can go in any GPR as well as VSX
> registers (PR
>     57744).
>  
> -   Similarly, don't allow OOmode (vector pair, restricted to even
> VSX
> -   registers) or XOmode (vector quad, restricted to FPR registers
> divisible
> -   by 4) to tie with other modes.
> +   Similarly, don't allow OOmode (vector pair), XOmode (vector
> quad), or
> +   TDOmode (dense math register) to pair with anything else.  Vector
> pairs are
> +   restricted to even/odd VSX registers.  Without dense math, vector
> quads are
> +   limited to FPR registers divisible by 4.  With dense math, vector
> quads are
> +   limited to even VSX registers or DMF registers.
>  
>     Altivec/VSX vector tests were moved ahead of scalar float mode,
> so that IEEE
>     128-bit floating point on VSX systems ties with other vectors. 
> */
> @@ -2018,7 +2021,8 @@ static bool
>  rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
>  {
>    if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
> -      || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
> +      || mode1 == TDOmode || mode2 == PTImode || mode2 == OOmode
> +      || mode2 == XOmode || mode2 == TDOmode)
>      return mode1 == mode2;
>  
>    if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1))
> @@ -2309,6 +2313,7 @@ rs6000_debug_reg_global (void)
>      V4DFmode,
>      OOmode,
>      XOmode,
> +    TDOmode,
>      CCmode,
>      CCUNSmode,
>      CCEQmode,
> @@ -2674,7 +2679,7 @@ rs6000_setup_reg_addr_masks (void)
>         /* Special case DMF registers.  */
>         if (rc == RELOAD_REG_DMF)
>           {
> -           if (TARGET_DENSE_MATH && m2 == XOmode)
> +           if (TARGET_DENSE_MATH && (m2 == XOmode || m2 ==
> TDOmode))
>               {
>                 addr_mask = RELOAD_REG_VALID;
>                 reg_addr[m].addr_mask[rc] = addr_mask;
> @@ -2781,10 +2786,10 @@ rs6000_setup_reg_addr_masks (void)
>  
>         /* Vector pairs can do both indexed and offset loads if
> the
>            instructions are enabled, otherwise they can only do
> offset loads
> -          since it will be broken into two vector moves.  Vector
> quads can
> -          only do offset loads.  */
> +          since it will be broken into two vector moves.  Vector
> quads and
> +          dense math types can only do offset loads.  */
>         else if ((addr_mask != 0) && TARGET_MMA
> -                && (m2 == OOmode || m2 == XOmode))
> +                && (m2 == OOmode || m2 == XOmode || m2 ==
> TDOmode))
>           {
>             addr_mask |= RELOAD_REG_OFFSET;
>             if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX)
> @@ -3012,6 +3017,14 @@ rs6000_init_hard_regno_mode_ok (bool
> global_init_p)
>        rs6000_vector_align[XOmode] = 512;
>      }
>  
> +  /* Add support for 1,024 bit DMF registers.  */
> +  if (TARGET_DENSE_MATH)
> +    {
> +      rs6000_vector_unit[TDOmode] = VECTOR_NONE;
> +      rs6000_vector_mem[TDOmode] = VECTOR_VSX;
> +      rs6000_vector_align[TDOmode] = 512;
> +    }
> +
>    /* Register class constraints for the constraints that depend on
> compile
>       switches. When the VSX code was added, different constraints
> were added
>       based on the type (DFmode, V2DFmode, V4SFmode).  For the vector
> types, all
> @@ -3224,6 +3237,12 @@ rs6000_init_hard_regno_mode_ok (bool
> global_init_p)
>       }
>      }
>  
> +  if (TARGET_DENSE_MATH)
> +    {
> +      reg_addr[TDOmode].reload_load =
> CODE_FOR_reload_dmf_from_memory;
> +      reg_addr[TDOmode].reload_store =
> CODE_FOR_reload_dmf_to_memory;
> +    }
> +
>    /* Precalculate HARD_REGNO_NREGS.  */
>    for (r = 0; HARD_REGISTER_NUM_P (r); ++r)
>      for (m = 0; m < NUM_MACHINE_MODES; ++m)
> @@ -8738,12 +8757,15 @@ reg_offset_addressing_ok_p (machine_mode
> mode)
>       return mode_supports_dq_form (mode);
>        break;
>  
> -      /* The vector pair/quad types support offset addressing if the
> -      underlying vectors support offset addressing.  */
> +      /* The vector pair/quad types and the dense math types support
> offset
> +      addressing if the underlying vectors support offset
> addressing.  */
>      case E_OOmode:
>      case E_XOmode:
>        return TARGET_MMA;
>  
> +    case E_TDOmode:
> +      return TARGET_DENSE_MATH;
> +
>      case E_SDmode:
>        /* If we can do direct load/stores of SDmode, restrict it to
> reg+reg
>        addressing for the LFIWZX and STFIWX instructions.  */
> @@ -11297,6 +11319,12 @@ rs6000_emit_move (rtx dest, rtx source,
> machine_mode mode)
>              (mode == OOmode) ? "__vector_pair" :
> "__vector_quad");
>        break;
>  
> +    case E_TDOmode:
> +      if (CONST_INT_P (operands[1]))
> +     error ("%qs is an opaque type, and you cannot set it to
> constants",
> +            "__dmf");
> +      break;
> +
>      case E_SImode:
>      case E_DImode:
>        /* Use default pattern for address of ELF small data */
> @@ -12760,7 +12788,7 @@ rs6000_secondary_reload_simple_move (enum
> rs6000_reg_type to_type,
>  
>    /* We can transfer between VSX registers and DMF registers without
> needing
>       extra registers.  */
> -  if (TARGET_DENSE_MATH && mode == XOmode
> +  if (TARGET_DENSE_MATH && (mode == XOmode || mode == TDOmode)
>        && ((to_type == DMF_REG_TYPE && from_type == VSX_REG_TYPE)
>         || (to_type == VSX_REG_TYPE && from_type ==
> DMF_REG_TYPE)))
>      return true;
> @@ -13561,6 +13589,9 @@ rs6000_preferred_reload_class (rtx x, enum
> reg_class rclass)
>        if (mode == XOmode)
>       return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS;
>  
> +      if (mode == TDOmode)
> +     return VSX_REGS;
> +
>        if (GET_MODE_CLASS (mode) == MODE_INT)
>       return GENERAL_REGS;
>      }
> @@ -20740,6 +20771,8 @@ rs6000_mangle_type (const_tree type)
>      return "u13__vector_pair";
>    if (type == vector_quad_type_node)
>      return "u13__vector_quad";
> +  if (type == dmf_type_node)
> +    return "u5__dmf";
>  
>    /* For all other types, use the default mangling.  */
>    return NULL;
> @@ -22870,6 +22903,10 @@ rs6000_dmf_register_move_cost (machine_mode
> mode, reg_class_t rclass)
>        if (mode == XOmode)
>       return reg_move_base;
>  
> +      /* __dmf (i.e. TDOmode) is transferred in 2 instructions.  */
> +      else if (mode == TDOmode)
> +     return reg_move_base * 2;
> +
>        else
>       return reg_move_base * 2 * hard_regno_nregs
> (FIRST_DMF_REGNO, mode);
>      }
> @@ -27556,9 +27593,10 @@ rs6000_split_multireg_move (rtx dst, rtx
> src)
>    mode = GET_MODE (dst);
>    nregs = hard_regno_nregs (reg, mode);
>  
> -  /* If we have a vector quad register for MMA, and this is a load
> or store,
> -     see if we can use vector paired load/stores.  */
> -  if (mode == XOmode && TARGET_MMA
> +  /* If we have a vector quad register for MMA or DMF register for
> dense math,
> +     and this is a load or store, see if we can use vector paired
> +     load/stores.  */
> +  if ((mode == XOmode || mode == TDOmode) && TARGET_MMA
>        && (MEM_P (dst) || MEM_P (src)))
>      {
>        reg_mode = OOmode;
> @@ -27566,7 +27604,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>      }
>    /* If we have a vector pair/quad mode, split it into two/four
> separate
>       vectors.  */
> -  else if (mode == OOmode || mode == XOmode)
> +  else if (mode == OOmode || mode == XOmode || mode == TDOmode)
>      reg_mode = V1TImode;
>    else if (FP_REGNO_P (reg))
>      reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode :
> @@ -27612,13 +27650,13 @@ rs6000_split_multireg_move (rtx dst, rtx
> src)
>        return;
>      }
>  
> -  /* The __vector_pair and __vector_quad modes are multi-register
> -     modes, so if we have to load or store the registers, we have to
> be
> -     careful to properly swap them if we're in little endian mode
> -     below.  This means the last register gets the first memory
> -     location.  We also need to be careful of using the right
> register
> -     numbers if we are splitting XO to OO.  */
> -  if (mode == OOmode || mode == XOmode)
> +  /* The __vector_pair, __vector_quad, and __dmf modes are multi-
> register
> +     modes, so if we have to load or store the registers, we have to
> be careful
> +     to properly swap them if we're in little endian mode below. 
> This means
> +     the last register gets the first memory location.  We also need
> to be
> +     careful of using the right register numbers if we are splitting
> XO to
> +     OO.  */
> +  if (mode == OOmode || mode == XOmode || mode == TDOmode)
>      {
>        nregs = hard_regno_nregs (reg, mode);
>        int reg_mode_nregs = hard_regno_nregs (reg, reg_mode);
> @@ -27755,7 +27793,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>        overlap.  */
>        int i;
>        /* XO/OO are opaque so cannot use subregs. */
> -      if (mode == OOmode || mode == XOmode )
> +      if (mode == OOmode || mode == XOmode || mode == TDOmode)
>       {
>         for (i = nregs - 1; i >= 0; i--)
>           {
> @@ -27929,7 +27967,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>           continue;
>  
>         /* XO/OO are opaque so cannot use subregs. */
> -       if (mode == OOmode || mode == XOmode )
> +       if (mode == OOmode || mode == XOmode || mode == TDOmode)
>           {
>             rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + j);
>             rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + j);
> @@ -28957,7 +28995,8 @@ rs6000_invalid_conversion (const_tree
> fromtype, const_tree totype)
>  
>    if (frommode != tomode)
>      {
> -      /* Do not allow conversions to/from XOmode and OOmode types. 
> */
> +      /* Do not allow conversions to/from XOmode, OOmode, and
> TDOmode
> +      types.  */
>        if (frommode == XOmode)
>       return N_("invalid conversion from type %<__vector_quad%>");
>        if (tomode == XOmode)
> @@ -28966,6 +29005,10 @@ rs6000_invalid_conversion (const_tree
> fromtype, const_tree totype)
>       return N_("invalid conversion from type %<__vector_pair%>");
>        if (tomode == OOmode)
>       return N_("invalid conversion to type %<__vector_pair%>");
> +      if (frommode == TDOmode)
> +     return N_("invalid conversion from type %<__dmf%>");
> +      if (tomode == TDOmode)
> +     return N_("invalid conversion to type %<__dmf%>");
>      }
>  
>    /* Conversion allowed.  */
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 169d81e208e..cae8f269cf1 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -986,7 +986,7 @@ enum data_align { align_abi, align_opt,
> align_both };
>  /* Modes that are not vectors, but require vector alignment.  Treat
> these like
>     vectors in terms of loads and stores.  */
>  #define
> VECTOR_ALIGNMENT_P(MODE)                                      \
> -  (FLOAT128_VECTOR_P (MODE) || (MODE) == OOmode || (MODE) == XOmode)
> +  (FLOAT128_VECTOR_P (MODE) || OPAQUE_MODE_P (MODE))
>  
>  #define
> ALTIVEC_VECTOR_MODE(MODE)                                     \
>    ((MODE) ==
> V16QImode                                                     \
> @@ -2277,6 +2277,7 @@ enum rs6000_builtin_type_index
>    RS6000_BTI_const_str,               /* pointer to const char *
> */
>    RS6000_BTI_vector_pair,     /* unsigned 256-bit types (vector
> pair).  */
>    RS6000_BTI_vector_quad,     /* unsigned 512-bit types (vector
> quad).  */
> +  RS6000_BTI_dmf,             /* unsigned 1,024-bit types (dmf). 
> */
>    RS6000_BTI_const_ptr_void,     /* const pointer to void */
>    RS6000_BTI_ptr_V16QI,
>    RS6000_BTI_ptr_V1TI,
> @@ -2315,6 +2316,7 @@ enum rs6000_builtin_type_index
>    RS6000_BTI_ptr_dfloat128,
>    RS6000_BTI_ptr_vector_pair,
>    RS6000_BTI_ptr_vector_quad,
> +  RS6000_BTI_ptr_dmf,
>    RS6000_BTI_ptr_long_long,
>    RS6000_BTI_ptr_long_long_unsigned,
>    RS6000_BTI_MAX
> @@ -2372,6 +2374,7 @@ enum rs6000_builtin_type_index
>  #define const_str_type_node          
> (rs6000_builtin_types[RS6000_BTI_const_str])
>  #define vector_pair_type_node                
> (rs6000_builtin_types[RS6000_BTI_vector_pair])
>  #define vector_quad_type_node                
> (rs6000_builtin_types[RS6000_BTI_vector_quad])
> +#define dmf_type_node                        
> (rs6000_builtin_types[RS6000_BTI_dmf])
>  #define pcvoid_type_node             
> (rs6000_builtin_types[RS6000_BTI_const_ptr_void])
>  #define ptr_V16QI_type_node          
> (rs6000_builtin_types[RS6000_BTI_ptr_V16QI])
>  #define ptr_V1TI_type_node           
> (rs6000_builtin_types[RS6000_BTI_ptr_V1TI])
> @@ -2410,6 +2413,7 @@ enum rs6000_builtin_type_index
>  #define ptr_dfloat128_type_node              
> (rs6000_builtin_types[RS6000_BTI_ptr_dfloat128])
>  #define ptr_vector_pair_type_node    
> (rs6000_builtin_types[RS6000_BTI_ptr_vector_pair])
>  #define ptr_vector_quad_type_node    
> (rs6000_builtin_types[RS6000_BTI_ptr_vector_quad])
> +#define ptr_dmf_type_node            
> (rs6000_builtin_types[RS6000_BTI_ptr_dmf])
>  #define ptr_long_long_integer_type_node      
> (rs6000_builtin_types[RS6000_BTI_ptr_long_long])
>  #define ptr_long_long_unsigned_type_node
> (rs6000_builtin_types[RS6000_BTI_ptr_long_long_unsigned])
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c
> b/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c
> new file mode 100644
> index 00000000000..1d52184c998
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c
> @@ -0,0 +1,63 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_dense_math_ok } */
> +/* { dg-options "-mdejagnu-cpu=future -O2" } */
> +
> +/* Test basic load/store for __dmf type.  */
> +
> +#ifndef CONSTRAINT
> +#if defined(USE_D)
> +#define CONSTRAINT "d"
> +
> +#elif defined(USE_V)
> +#define CONSTRAINT "v"
> +
> +#elif defined(USE_WA)
> +#define CONSTRAINT "wa"
> +
> +#else
> +#define CONSTRAINT "wD"
> +#endif
> +#endif
> +const char constraint[] = CONSTRAINT;
> +
> +void foo_mem_asm (__dmf *p, __dmf *q)
> +{
> +  /* 2 LXVP instructions.  */
> +  __dmf vq = *p;
> +
> +  /* 2 DMXXINSTDMR512 instructions to transfer VSX to DMF.  */
> +  __asm__ ("# foo (" CONSTRAINT ") %A0" : "+" CONSTRAINT (vq));
> +  /* 2 DMXXEXTFDMR512 instructions to transfer DMF to VSX.  */
> +
> +  /* 2 STXVP instructions.  */
> +  *q = vq;
> +}
> +
> +void foo_mem_asm2 (__dmf *p, __dmf *q)
> +{
> +  /* 2 LXVP instructions.  */
> +  __dmf vq = *p;
> +  __dmf vq2;
> +  __dmf vq3;
> +
> +  /* 2 DMXXINSTDMR512 instructions to transfer VSX to DMF.  */
> +  __asm__ ("# foo1 (" CONSTRAINT ") %A0" : "+" CONSTRAINT (vq));
> +  /* 2 DMXXEXTFDMR512 instructions to transfer DMF to VSX.  */
> +
> +  vq2 = vq;
> +  __asm__ ("# foo2 (wa) %0" : "+wa" (vq2));
> +
> +  /* 2 STXVP instructions.  */
> +  *q = vq2;
> +}
> +
> +void foo_mem (__dmf *p, __dmf *q)
> +{
> +  /* 2 LXVP, 2 STXVP instructions, no DMF transfer.  */
> +  *q = *p;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mdmxxextfdmr512\M}  4 } } */
> +/* { dg-final { scan-assembler-times {\mdmxxinstdmr512\M}  4 } } */
> +/* { dg-final { scan-assembler-times {\mlxvp\M}           12 } } */
> +/* { dg-final { scan-assembler-times {\mstxvp\M}          12 } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp
> b/gcc/testsuite/lib/target-supports.exp
> index 67f1a3c8230..4f9a79702cb 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -7839,6 +7839,41 @@ proc check_effective_target_power10_ok { } {
>      }
>  }
>  
> +# Return 1 if this is a PowerPC target supporting -mcpu=future which
> enables
> +# some potential new instructions.
> +proc check_effective_target_powerpc_future_ok { } {
> +       return [check_no_compiler_messages powerpc_future_ok object {
> +           #ifndef _ARCH_PWR_FUTURE
> +           #error "-mcpu=future is not supported"
> +           #else
> +           int dummy;
> +           #endif
> +       } "-mcpu=future"]
> +}
> +
> +# Return 1 if this is a PowerPC target supporting -mcpu=future which
> enables
> +# the dense math operations.
> +proc check_effective_target_powerpc_dense_math_ok { } {
> +    if { ([istarget powerpc*-*-*]) } {
> +     return [check_no_compiler_messages powerpc_dense_math_ok
> object {
> +         __vector_quad vq;
> +         int main (void) {
> +             #ifndef __DENSE_MATH__
> +             #error "target does not have dense math support."
> +             #else
> +             /* Make sure we have dense math support.  */
> +               __vector_quad dmr;
> +               __asm__ ("dmsetaccz %A0" : "=wD" (dmr));
> +               vq = dmr;
> +             #endif
> +             return 0;
> +         }
> +     } "-mcpu=future"]
> +    } else {
> +     return 0;
> +    }
> +}
> +
>  # Return 1 if this is a PowerPC target supporting -mfloat128 via
> either
>  # software emulation on power7/power8 systems or hardware support on
> power9.
>  
> -- 
> 2.51.1
>

Re: [PATCH V2, 3/3] Add support for 1,024 bit DMF registers on a future PowerPC.

Reply via email to