[PATCH 5/7] Add support for 1,024 bit dense math registers

Michael Meissner Wed, 01 Jul 2026 11:54:18 -0700

This is part five of the dense math register patches for the PowerPC.
This is the 7th version of the dense math patches.


Version 6 of the dense math register patches were posted on April 21st,
2026.

 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713352.html
 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713353.html
 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713354.html
 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713355.html
 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713356.html
 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713357.html

This patch needs the -mcpu=future patch posted on April 8th, 2026:

  * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712532.html

This patch is functionally the same as the version 6 patch, except I made the
same name changes as I discussed in the previous patch.

This patch (#5) is a prelimianry patch to add the full 1,024 bit dense math
register (DMFs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top
of the DMR register.

This patch only adds the new 1,024 bit register support.  It does not add
support for any instructions that need 1,024 bit registers instead of 512 bit
registers.

I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit
registers.  The 'wD' constraint added in previous patches is used for these
registers.  I added support to do load and store of DMRs via the VSX registers,
since there are no load/store dense math instructions.  I added the new keyword
'__dm1024' to create 1,024 bit types that can be loaded into dense math
registers.

I have committed all of the patches in my backlog (dense math registers, other
-mcpu=future instructions, random bug fixes, support for _Float16 and
__bfloat16, and optimizations for vector logical operations on power10/power11)
into the IBM vendor branch:

        vendors/ibm/gcc-17-future

I have built bootstrap little endian compilers on power10 systems, and
big endian compiler on power9 systems.  There were no regression in the
tests.  Can I add the patches to the GCC trunk?

2026-07-01   Michael Meissner  <[email protected]>

gcc/

        * config/rs6000/mma.md (UNSPEC_DMF_INSERT512_UPPER): New unspec.
        (UNSPEC_DMF_INSERT512_LOWER): Likewise.
        (UNSPEC_DMF_EXTRACT512): Likewise.
        (UNSPEC_DMF_RELOAD_FROM_MEMORY): Likewise.
        (UNSPEC_DMF_RELOAD_TO_MEMORY): Likewise.
        (movtdo): New define_expand and define_insn_and_split to implement 1,024
        bit dense math registers.
        (movtdo_insert512_upper): New insn.
        (movtdo_insert512_lower): Likewise.
        (movtdo_extract512): Likewise.
        (reload_tdo_from_memory): Likewise.
        (reload_tdo_to_memory): Likewise.
        * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add dense math
        register support.
        (rs6000_init_builtins): Add support for __dm1024 keyword.
        * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
        for TDOmode.
        (rs6000_function_arg): Likewise.
        * config/rs6000/rs6000-modes.def (TDOmode): New mode.
        * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Add
        support for TDOmode.
        (rs6000_hard_regno_mode_ok): Likewise.
        (rs6000_modes_tieable_p): Likewise.
        (rs6000_debug_reg_global): Likewise.
        (rs6000_setup_reg_addr_masks): Likewise.
        (rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
        hooks for dense math TDO reload mode.
        (reg_offset_addressing_ok_p): Add support for TDOmode.
        (rs6000_emit_move): Likewise.
        (rs6000_secondary_reload_simple_move): Likewise.
        (rs6000_preferred_reload_class): Likewise.
        (rs6000_mangle_type): Add mangling for __dm1024 type.
        (rs6000_dmf_register_move_cost): Add support for TDOmode.
        (rs6000_split_multireg_move): Likewise.
        (rs6000_invalid_conversion): Likewise.
        * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
        (enum rs6000_builtin_type_index): Add dense math register type nodes.
        (dm1024_type_node): Likewise.
        (ptr_dm1024_type_node): Likewise.

gcc/testsuite/

        * gcc.target/powerpc/dm-1024bit.c: New test.

---
 gcc/config/rs6000/mma.md            | 155 ++++++++++++++++++++++++++++
 gcc/config/rs6000/rs6000-builtin.cc |  17 +++
 gcc/config/rs6000/rs6000-call.cc    |  10 +-
 gcc/config/rs6000/rs6000-modes.def  |   4 +
 gcc/config/rs6000/rs6000.cc         | 109 ++++++++++++++-----
 gcc/config/rs6000/rs6000.h          |   6 +-
 6 files changed, 270 insertions(+), 31 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index c017b7ca1e7..95cee85925b 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -92,6 +92,11 @@ (define_c_enum "unspec"
    UNSPEC_MMA_XXMFACC
    UNSPEC_MMA_XXMTACC
    UNSPEC_MMA_DMSETDMRZ
+   UNSPEC_DMF_INSERT512_UPPER
+   UNSPEC_DMF_INSERT512_LOWER
+   UNSPEC_DMF_EXTRACT512
+   UNSPEC_DMF_RELOAD_FROM_MEMORY
+   UNSPEC_DMF_RELOAD_TO_MEMORY
   ])
 
 (define_c_enum "unspecv"
@@ -811,3 +816,153 @@ (define_insn "mma_<avvi4i4i4>"
   "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6"
   [(set_attr "type" "mma")
    (set_attr "prefixed" "yes")])
+
+;; TDOmode (__dmf keyword for 1,024 bit registers).
+(define_expand "movtdo"
+  [(set (match_operand:TDO 0 "nonimmediate_operand")
+       (match_operand:TDO 1 "input_operand"))]
+  "TARGET_DENSE_MATH"
+{
+  rs6000_emit_move (operands[0], operands[1], TDOmode);
+  DONE;
+})
+
+(define_insn_and_split "*movtdo"
+  [(set (match_operand:TDO 0 "nonimmediate_operand" "=wa,m,wa,wD,wD,wa")
+       (match_operand:TDO 1 "input_operand" "m,wa,wa,wa,wD,wD"))]
+  "TARGET_DENSE_MATH
+   && (gpc_reg_operand (operands[0], TDOmode)
+       || gpc_reg_operand (operands[1], TDOmode))"
+  "@
+   #
+   #
+   #
+   #
+   dmmr %0,%1
+   #"
+  "&& reload_completed
+   && (!dmf_register_operand (operands[0], TDOmode)
+       || !dmf_register_operand (operands[1], TDOmode))"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+
+  if (REG_P (op0) && REG_P (op1))
+    {
+      int regno0 = REGNO (op0);
+      int regno1 = REGNO (op1);
+
+      if (DMF_REGNO_P (regno0) && VSX_REGNO_P (regno1))
+       {
+         rtx op1_upper = gen_rtx_REG (XOmode, regno1);
+         rtx op1_lower = gen_rtx_REG (XOmode, regno1 + 4);
+         emit_insn (gen_movtdo_insert512_upper (op0, op1_upper));
+         emit_insn (gen_movtdo_insert512_lower (op0, op0, op1_lower));
+         DONE;
+       }
+
+      else if (VSX_REGNO_P (regno0) && DMF_REGNO_P (regno1))
+       {
+         rtx op0_upper = gen_rtx_REG (XOmode, regno0);
+         rtx op0_lower = gen_rtx_REG (XOmode, regno0 + 4);
+         emit_insn (gen_movtdo_extract512 (op0_upper, op1, const0_rtx));
+         emit_insn (gen_movtdo_extract512 (op0_lower, op1, const1_rtx));
+         DONE;
+       }
+
+     else
+       gcc_assert (VSX_REGNO_P (regno0) && VSX_REGNO_P (regno1));
+    }
+
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "type" "vecload,vecstore,vecmove,vecmove,vecmove,vecmove")
+   (set_attr "length" "*,*,32,8,*,8")
+   (set_attr "max_prefixed_insns" "4,4,*,*,*,*")])
+
+;; Move from VSX registers to dense math registers via two insert 512 bit
+;; instructions.
+(define_insn "movtdo_insert512_upper"
+  [(set (match_operand:TDO 0 "dmf_register_operand" "=wD")
+       (unspec:TDO [(match_operand:XO 1 "vsx_register_operand" "wa")]
+                   UNSPEC_DMF_INSERT512_UPPER))]
+  "TARGET_DENSE_MATH"
+  "dmxxinstdmr512 %0,%1,%Y1,0"
+  [(set_attr "type" "mma")])
+
+(define_insn "movtdo_insert512_lower"
+  [(set (match_operand:TDO 0 "dmf_register_operand" "=wD")
+       (unspec:TDO [(match_operand:TDO 1 "dmf_register_operand" "0")
+                    (match_operand:XO 2 "vsx_register_operand" "wa")]
+                   UNSPEC_DMF_INSERT512_LOWER))]
+  "TARGET_DENSE_MATH"
+  "dmxxinstdmr512 %0,%2,%Y2,1"
+  [(set_attr "type" "mma")])
+
+;; Move from dense math registers to VSX registers via two extract 512 bit
+;; instructions.
+(define_insn "movtdo_extract512"
+  [(set (match_operand:XO 0 "vsx_register_operand" "=wa")
+       (unspec:XO [(match_operand:TDO 1 "dmf_register_operand" "wD")
+                   (match_operand 2 "const_0_to_1_operand" "n")]
+                  UNSPEC_DMF_EXTRACT512))]
+  "TARGET_DENSE_MATH"
+  "dmxxextfdmr512 %0,%Y0,%1,%2"
+  [(set_attr "type" "mma")])
+
+;; Reload dense math registers from memory.
+(define_insn_and_split "reload_tdo_from_memory"
+  [(set (match_operand:TDO 0 "dmf_register_operand" "=wD")
+       (unspec:TDO [(match_operand:TDO 1 "memory_operand" "m")]
+                   UNSPEC_DMF_RELOAD_FROM_MEMORY))
+   (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))]
+  "TARGET_DENSE_MATH"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx tmp = operands[2];
+  rtx mem_upper = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 0 : 64);
+  rtx mem_lower = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 64 : 0);
+
+  emit_move_insn (tmp, mem_upper);
+  emit_insn (gen_movtdo_insert512_upper (dest, tmp));
+
+  emit_move_insn (tmp, mem_lower);
+  emit_insn (gen_movtdo_insert512_lower (dest, dest, tmp));
+  DONE;
+}
+  [(set_attr "length" "16")
+   (set_attr "max_prefixed_insns" "2")
+   (set_attr "type" "vecload")])
+
+;; Reload dense math registers to memory
+(define_insn_and_split "reload_tdo_to_memory"
+  [(set (match_operand:TDO 0 "memory_operand" "=m")
+       (unspec:TDO [(match_operand:TDO 1 "dmf_register_operand" "wD")]
+                   UNSPEC_DMF_RELOAD_TO_MEMORY))
+   (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))]
+  "TARGET_DENSE_MATH"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx tmp = operands[2];
+  rtx mem_upper = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 0 : 64);
+  rtx mem_lower = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 64 : 0);
+
+  emit_insn (gen_movtdo_extract512 (tmp, src, const0_rtx));
+  emit_move_insn (mem_upper, tmp);
+
+  emit_insn (gen_movtdo_extract512 (tmp, src, const1_rtx));
+  emit_move_insn (mem_lower, tmp);
+  DONE;
+}
+  [(set_attr "length" "16")
+   (set_attr "max_prefixed_insns" "2")])
diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 23ba2a5a9a2..0a9c06e41ea 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -512,6 +512,8 @@ const char *rs6000_type_string (tree type_node)
     return "__vector_pair";
   else if (type_node == vector_quad_type_node)
     return "__vector_quad";
+  else if (type_node == dm1024_type_node)
+    return "__dm1024";
 
   return "unknown";
 }
@@ -813,6 +815,21 @@ rs6000_init_builtins (void)
   t = build_qualified_type (vector_quad_type_node, TYPE_QUAL_CONST);
   ptr_vector_quad_type_node = build_pointer_type (t);
 
+  /* For TDOmode (1,024 bit dense math accumulators), don't use an alignment of
+     1,024, use 512.  TDOmode loads and stores are always broken up into 2
+     vector pair loads or stores.  In addition, we don't have support for
+     aligning the stack to 1,024 bits.  */
+  dm1024_type_node = make_node (OPAQUE_TYPE);
+  SET_TYPE_MODE (dm1024_type_node, TDOmode);
+  TYPE_SIZE (dm1024_type_node) = bitsize_int (GET_MODE_BITSIZE (TDOmode));
+  TYPE_PRECISION (dm1024_type_node) = GET_MODE_BITSIZE (TDOmode);
+  TYPE_SIZE_UNIT (dm1024_type_node) = size_int (GET_MODE_SIZE (TDOmode));
+  SET_TYPE_ALIGN (dm1024_type_node, 512);
+  TYPE_USER_ALIGN (dm1024_type_node) = 0;
+  lang_hooks.types.register_builtin_type (dm1024_type_node, "__dm1024");
+  t = build_qualified_type (dm1024_type_node, TYPE_QUAL_CONST);
+  ptr_dm1024_type_node = build_pointer_type (t);
+
   tdecl = add_builtin_type ("__bool char", bool_char_type_node);
   TYPE_NAME (bool_char_type_node) = tdecl;
 
diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc
index b9b791bfe8a..e6e90835544 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -437,14 +437,15 @@ rs6000_return_in_memory (const_tree type, const_tree 
fntype ATTRIBUTE_UNUSED)
   if (cfun
       && !cfun->machine->mma_return_type_error
       && TREE_TYPE (cfun->decl) == fntype
-      && (TYPE_MODE (type) == OOmode || TYPE_MODE (type) == XOmode))
+      && OPAQUE_MODE_P (TYPE_MODE (type)))
     {
       /* Record we have now handled function CFUN, so the next time we
         are called, we do not re-report the same error.  */
       cfun->machine->mma_return_type_error = true;
       if (TYPE_CANONICAL (type) != NULL_TREE)
        type = TYPE_CANONICAL (type);
-      error ("invalid use of MMA type %qs as a function return value",
+      error ("invalid use of %s type %qs as a function return value",
+            (TYPE_MODE (type) == TDOmode) ? "dense math" : "MMA",
             IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type))));
     }
 
@@ -1632,11 +1633,12 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
function_arg_info &arg)
   int n_elts;
 
   /* We do not allow MMA types being used as function arguments.  */
-  if (mode == OOmode || mode == XOmode)
+  if (OPAQUE_MODE_P (mode))
     {
       if (TYPE_CANONICAL (type) != NULL_TREE)
        type = TYPE_CANONICAL (type);
-      error ("invalid use of MMA operand of type %qs as a function parameter",
+      error ("invalid use of %s operand of type %qs as a function parameter",
+            (mode == TDOmode) ? "dense math" : "MMA",
             IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type))));
       return NULL_RTX;
     }
diff --git a/gcc/config/rs6000/rs6000-modes.def 
b/gcc/config/rs6000/rs6000-modes.def
index 7140b634c41..6fca027949d 100644
--- a/gcc/config/rs6000/rs6000-modes.def
+++ b/gcc/config/rs6000/rs6000-modes.def
@@ -79,3 +79,7 @@ PARTIAL_INT_MODE (TI, 128, PTI);
 /* Modes used by __vector_pair and __vector_quad.  */
 OPAQUE_MODE (OO, 32);
 OPAQUE_MODE (XO, 64);
+
+/* Mode used by __dmf.  */
+OPAQUE_MODE (TDO, 128);
+
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index aebea1ba22e..c47729e7c3a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1897,7 +1897,22 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
machine_mode mode)
                && (regno & 1) == 0);
     }
 
-  /* No other types other than XOmode can go in dense math registers.  */
+  if (mode == TDOmode)
+    {
+      if (!TARGET_DENSE_MATH)
+       return 0;
+
+      if (DMF_REGNO_P (regno))
+       return 1;
+
+      else
+       return (VSX_REGNO_P (regno)
+               && VSX_REGNO_P (last_regno)
+               && (regno & 1) == 0);
+    }
+
+  /* No other types other than XOmode or TDOmode can go in dense math
+     registers.  */
   if (DMF_REGNO_P (regno))
     return 0;
 
@@ -2005,9 +2020,11 @@ rs6000_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
    GPR registers, and TImode can go in any GPR as well as VSX registers (PR
    57744).
 
-   Similarly, don't allow OOmode (vector pair, restricted to even VSX
-   registers) or XOmode (vector quad, restricted to FPR registers divisible
-   by 4) to tie with other modes.
+   Similarly, don't allow OOmode (vector pair), XOmode (vector quad), or
+   TDOmode (dense math register) to pair with anything else.  Vector pairs are
+   restricted to even/odd VSX registers.  Without dense math, vector quads are
+   limited to FPR registers divisible by 4.  With dense math, vector quads are
+   limited to even VSX registers or dense math registers.
 
    Altivec/VSX vector tests were moved ahead of scalar float mode, so that IEEE
    128-bit floating point on VSX systems ties with other vectors.  */
@@ -2016,7 +2033,8 @@ static bool
 rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 {
   if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
-      || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
+      || mode1 == TDOmode || mode2 == PTImode || mode2 == OOmode
+      || mode2 == XOmode || mode2 == TDOmode)
     return mode1 == mode2;
 
   if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1))
@@ -2307,6 +2325,7 @@ rs6000_debug_reg_global (void)
     V4DFmode,
     OOmode,
     XOmode,
+    TDOmode,
     CCmode,
     CCUNSmode,
     CCEQmode,
@@ -2672,7 +2691,7 @@ rs6000_setup_reg_addr_masks (void)
          /* Special case dense math registers.  */
          if (rc == RELOAD_REG_DMR)
            {
-             if (TARGET_DENSE_MATH && m2 == XOmode)
+             if (TARGET_DENSE_MATH && (m2 == XOmode || m2 == TDOmode))
                {
                  addr_mask = RELOAD_REG_VALID;
                  reg_addr[m].addr_mask[rc] = addr_mask;
@@ -2779,10 +2798,10 @@ rs6000_setup_reg_addr_masks (void)
 
          /* Vector pairs can do both indexed and offset loads if the
             instructions are enabled, otherwise they can only do offset loads
-            since it will be broken into two vector moves.  Vector quads can
-            only do offset loads.  */
+            since it will be broken into two vector moves.  Vector quads and
+            dense math types can only do offset loads.  */
          else if ((addr_mask != 0) && TARGET_MMA
-                  && (m2 == OOmode || m2 == XOmode))
+                  && (m2 == OOmode || m2 == XOmode || m2 == TDOmode))
            {
              addr_mask |= RELOAD_REG_OFFSET;
              if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX)
@@ -3010,6 +3029,14 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
       rs6000_vector_align[XOmode] = 512;
     }
 
+  /* Add support for 1,024 bit dense math registers.  */
+  if (TARGET_DENSE_MATH)
+    {
+      rs6000_vector_unit[TDOmode] = VECTOR_NONE;
+      rs6000_vector_mem[TDOmode] = VECTOR_VSX;
+      rs6000_vector_align[TDOmode] = 512;
+    }
+
   /* Register class constraints for the constraints that depend on compile
      switches. When the VSX code was added, different constraints were added
      based on the type (DFmode, V2DFmode, V4SFmode).  For the vector types, all
@@ -3223,6 +3250,12 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
        }
     }
 
+  if (TARGET_DENSE_MATH)
+    {
+      reg_addr[TDOmode].reload_load = CODE_FOR_reload_tdo_from_memory;
+      reg_addr[TDOmode].reload_store = CODE_FOR_reload_tdo_to_memory;
+    }
+
   /* Precalculate HARD_REGNO_NREGS.  */
   for (r = 0; HARD_REGISTER_NUM_P (r); ++r)
     for (m = 0; m < NUM_MACHINE_MODES; ++m)
@@ -8737,12 +8770,15 @@ reg_offset_addressing_ok_p (machine_mode mode)
        return mode_supports_dq_form (mode);
       break;
 
-      /* The vector pair/quad types support offset addressing if the
-        underlying vectors support offset addressing.  */
+      /* The vector pair/quad types and the dense math types support offset
+        addressing if the underlying vectors support offset addressing.  */
     case E_OOmode:
     case E_XOmode:
       return TARGET_MMA;
 
+    case E_TDOmode:
+      return TARGET_DENSE_MATH;
+
     case E_SDmode:
       /* If we can do direct load/stores of SDmode, restrict it to reg+reg
         addressing for the LFIWZX and STFIWX instructions.  */
@@ -11296,6 +11332,12 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode 
mode)
               (mode == OOmode) ? "__vector_pair" : "__vector_quad");
       break;
 
+    case E_TDOmode:
+      if (CONST_INT_P (operands[1]))
+       error ("%qs is an opaque type, and you cannot set it to constants",
+              "__dm1024");
+      break;
+
     case E_SImode:
     case E_DImode:
       /* Use default pattern for address of ELF small data */
@@ -12759,7 +12801,7 @@ rs6000_secondary_reload_simple_move (enum 
rs6000_reg_type to_type,
 
   /* We can transfer between VSX registers and dense math registers without
      needing extra registers.  */
-  if (TARGET_DENSE_MATH && mode == XOmode
+  if (TARGET_DENSE_MATH && (mode == XOmode || mode == TDOmode)
       && ((to_type == DMF_REG_TYPE && from_type == VSX_REG_TYPE)
          || (to_type == VSX_REG_TYPE && from_type == DMF_REG_TYPE)))
     return true;
@@ -13560,6 +13602,9 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
rclass)
       if (mode == XOmode)
        return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS;
 
+      if (mode == TDOmode)
+       return VSX_REGS;
+
       if (GET_MODE_CLASS (mode) == MODE_INT)
        return GENERAL_REGS;
     }
@@ -20734,6 +20779,8 @@ rs6000_mangle_type (const_tree type)
     return "u13__vector_pair";
   if (type == vector_quad_type_node)
     return "u13__vector_quad";
+  if (type == dm1024_type_node)
+    return "u8__dm1024";
 
   /* For all other types, use the default mangling.  */
   return NULL;
@@ -22864,6 +22911,10 @@ rs6000_dmf_register_move_cost (machine_mode mode, 
reg_class_t rclass)
       if (mode == XOmode)
        return reg_move_base;
 
+      /* __dm1024 (i.e. TDOmode) is transferred in 2 instructions.  */
+      else if (mode == TDOmode)
+       return reg_move_base * 2;
+
       else
        return reg_move_base * 2 * hard_regno_nregs (FIRST_DMF_REGNO, mode);
     }
@@ -27550,9 +27601,10 @@ rs6000_split_multireg_move (rtx dst, rtx src)
   mode = GET_MODE (dst);
   nregs = hard_regno_nregs (reg, mode);
 
-  /* If we have a vector quad register for MMA, and this is a load or store,
-     see if we can use vector paired load/stores.  */
-  if (mode == XOmode && TARGET_MMA
+  /* If we have a vector quad register for MMA or dense math register
+     and this is a load or store, see if we can use vector paired
+     load/stores.  */
+  if ((mode == XOmode || mode == TDOmode) && TARGET_MMA
       && (MEM_P (dst) || MEM_P (src)))
     {
       reg_mode = OOmode;
@@ -27560,7 +27612,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
     }
   /* If we have a vector pair/quad mode, split it into two/four separate
      vectors.  */
-  else if (mode == OOmode || mode == XOmode)
+  else if (mode == OOmode || mode == XOmode || mode == TDOmode)
     reg_mode = V1TImode;
   else if (FP_REGNO_P (reg))
     reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode :
@@ -27606,13 +27658,13 @@ rs6000_split_multireg_move (rtx dst, rtx src)
       return;
     }
 
-  /* The __vector_pair and __vector_quad modes are multi-register
-     modes, so if we have to load or store the registers, we have to be
-     careful to properly swap them if we're in little endian mode
-     below.  This means the last register gets the first memory
-     location.  We also need to be careful of using the right register
-     numbers if we are splitting XO to OO.  */
-  if (mode == OOmode || mode == XOmode)
+  /* The __vector_pair, __vector_quad, and __dm1024 modes are multi-register
+     modes, so if we have to load or store the registers, we have to be careful
+     to properly swap them if we're in little endian mode below.  This means
+     the last register gets the first memory location.  We also need to be
+     careful of using the right register numbers if we are splitting XO to
+     OO.  */
+  if (mode == OOmode || mode == XOmode || mode == TDOmode)
     {
       nregs = hard_regno_nregs (reg, mode);
       int reg_mode_nregs = hard_regno_nregs (reg, reg_mode);
@@ -27749,7 +27801,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
         overlap.  */
       int i;
       /* XO/OO are opaque so cannot use subregs. */
-      if (mode == OOmode || mode == XOmode )
+      if (mode == OOmode || mode == XOmode || mode == TDOmode)
        {
          for (i = nregs - 1; i >= 0; i--)
            {
@@ -27923,7 +27975,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
            continue;
 
          /* XO/OO are opaque so cannot use subregs. */
-         if (mode == OOmode || mode == XOmode )
+         if (mode == OOmode || mode == XOmode || mode == TDOmode)
            {
              rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + j);
              rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + j);
@@ -28951,7 +29003,8 @@ rs6000_invalid_conversion (const_tree fromtype, 
const_tree totype)
 
   if (frommode != tomode)
     {
-      /* Do not allow conversions to/from XOmode and OOmode types.  */
+      /* Do not allow conversions to/from XOmode, OOmode, and TDOmode
+        types.  */
       if (frommode == XOmode)
        return N_("invalid conversion from type %<__vector_quad%>");
       if (tomode == XOmode)
@@ -28960,6 +29013,10 @@ rs6000_invalid_conversion (const_tree fromtype, 
const_tree totype)
        return N_("invalid conversion from type %<__vector_pair%>");
       if (tomode == OOmode)
        return N_("invalid conversion to type %<__vector_pair%>");
+      if (frommode == TDOmode)
+       return N_("invalid conversion from type %<__dm1024%>");
+      if (tomode == TDOmode)
+       return N_("invalid conversion to type %<__dm1024%>");
     }
 
   /* Conversion allowed.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 65bffaa75b8..62f014f1951 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -993,7 +993,7 @@ enum data_align { align_abi, align_opt, align_both };
 /* Modes that are not vectors, but require vector alignment.  Treat these like
    vectors in terms of loads and stores.  */
 #define VECTOR_ALIGNMENT_P(MODE)                                       \
-  (FLOAT128_VECTOR_P (MODE) || (MODE) == OOmode || (MODE) == XOmode)
+  (FLOAT128_VECTOR_P (MODE) || OPAQUE_MODE_P (MODE))
 
 #define ALTIVEC_VECTOR_MODE(MODE)                                      \
   ((MODE) == V16QImode                                                 \
@@ -2284,6 +2284,7 @@ enum rs6000_builtin_type_index
   RS6000_BTI_const_str,                 /* pointer to const char * */
   RS6000_BTI_vector_pair,       /* unsigned 256-bit types (vector pair).  */
   RS6000_BTI_vector_quad,       /* unsigned 512-bit types (vector quad).  */
+  RS6000_BTI_dm1024,            /* unsigned 1,024-bit types (dmf).  */
   RS6000_BTI_const_ptr_void,     /* const pointer to void */
   RS6000_BTI_ptr_V16QI,
   RS6000_BTI_ptr_V1TI,
@@ -2322,6 +2323,7 @@ enum rs6000_builtin_type_index
   RS6000_BTI_ptr_dfloat128,
   RS6000_BTI_ptr_vector_pair,
   RS6000_BTI_ptr_vector_quad,
+  RS6000_BTI_ptr_dm1024,
   RS6000_BTI_ptr_long_long,
   RS6000_BTI_ptr_long_long_unsigned,
   RS6000_BTI_INTPTI,
@@ -2383,6 +2385,7 @@ enum rs6000_builtin_type_index
 #define const_str_type_node             
(rs6000_builtin_types[RS6000_BTI_const_str])
 #define vector_pair_type_node           
(rs6000_builtin_types[RS6000_BTI_vector_pair])
 #define vector_quad_type_node           
(rs6000_builtin_types[RS6000_BTI_vector_quad])
+#define dm1024_type_node                
(rs6000_builtin_types[RS6000_BTI_dm1024])
 #define pcvoid_type_node                
(rs6000_builtin_types[RS6000_BTI_const_ptr_void])
 #define ptr_V16QI_type_node             
(rs6000_builtin_types[RS6000_BTI_ptr_V16QI])
 #define ptr_V1TI_type_node              
(rs6000_builtin_types[RS6000_BTI_ptr_V1TI])
@@ -2421,6 +2424,7 @@ enum rs6000_builtin_type_index
 #define ptr_dfloat128_type_node                 
(rs6000_builtin_types[RS6000_BTI_ptr_dfloat128])
 #define ptr_vector_pair_type_node       
(rs6000_builtin_types[RS6000_BTI_ptr_vector_pair])
 #define ptr_vector_quad_type_node       
(rs6000_builtin_types[RS6000_BTI_ptr_vector_quad])
+#define ptr_dm1024_type_node            
(rs6000_builtin_types[RS6000_BTI_ptr_dm1024])
 #define ptr_long_long_integer_type_node         
(rs6000_builtin_types[RS6000_BTI_ptr_long_long])
 #define ptr_long_long_unsigned_type_node 
(rs6000_builtin_types[RS6000_BTI_ptr_long_long_unsigned])
 
-- 
2.54.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: [email protected]

[PATCH 5/7] Add support for 1,024 bit dense math registers

Reply via email to