The MMA subsystem added the notion of accumulator registers as an optional
feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with
the VSX registers 0..31, but logically the accumulator registers were separate
from the FPR registers. In ISA 3.1, it was anticipated that in future systems,
the accumulator registers may no overlap with the FPR registers. This patch
adds the support for dense math registers as separate registers.
This particular patch does not change the MMA support to use the accumulators
within the dense math registers. This patch just adds the basic support for
having separate DMRs. The next patch will switch the MMA support to use the
accumulators if -mcpu=future is used.
For testing purposes, I added an undocumented option '-mdense-math' to enable
or disable the dense math support.
This patch updates the wD constraint added in the previous patch. If MMA is
selected but dense math is not selected (i.e. -mcpu=power10), the wD constraint
will allow access to accumulators that overlap with VSX registers 0..31. If
both MMA and dense math are selected (i.e. -mcpu=future), the wD constraint
will only allow dense math registers.
This patch modifies the existing %A output modifier. If MMA is selected but
dense math is not selected, then %A output modifier converts the VSX register
number to the accumulator number, by dividing it by 4. If both MMA and dense
math are selected, then %A will map the separate DMF registers into 0..7.
The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:
1) If possible, don't use extended asm, but instead use the MMA built-in
functions;
2) If you do need to write extended asm, change the d constraints
targetting accumulators should now use wD;
3) Only use the built-in zero, assemble and disassemble functions create
move data between vector quad types and dense math accumulators.
I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
extended asm code. The reason is these instructions assume there is a
1-to-1 correspondence between 4 adjacent FPR registers and an
accumulator that overlaps with those instructions. With accumulators
now being separate registers, there no longer is a 1-to-1
correspondence.
It is possible that the mangling for DMFs and the GDB register numbers may
produce other changes in the future.
I have built bootstrap GCC compilers on little endian and big endian
PowerPC servers, and there were no regressions. Can I commit this
patch to GCC 16 once the following patches have been applied?
* https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700539.html
* https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700540.html
* https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700542.html
gcc/
2025-11-13 Michael Meissner <[email protected]>
* config/rs6000/mma.md (UNSPEC_MMA_DMSETDMRZ): New unspec.
(movxo): Add comments about dense math registers.
(movxo_nodm): Rename from movxo and restrict the usage to machines
without dense math registers.
(movxo_dm): New insn for movxo support for machines with dense math
registers.
(mma_<acc>): Restrict usage to machines without dense math registers.
(mma_xxsetaccz): Add a define_expand wrapper, and add support for dense
math registers.
(mma_dmsetaccz): New insn.
* config/rs6000/predicates.md (dmf_operand): New predicate.
(accumulator_operand): Add support for dense math registers.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do
not issue a de-prime instruction when disassembling a vector quad on a
system with dense math registers.
* config/rs6000/rs6000-c.cc (rs6000_define_or_undefine_macro): Define
__DENSE_MATH__ if we have dense math registers.
* config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): Add -mdense-math.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMF_REG_TYPE.
(enum rs6000_reload_reg_type): Add RELOAD_REG_DMF.
(LAST_RELOAD_REG_CLASS): Add support for DMF registers and the wD
constraint.
(reload_reg_map): Likewise.
(rs6000_reg_names): Likewise.
(alt_reg_names): Likewise.
(rs6000_hard_regno_nregs_internal): Likewise.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Likewise.
(rs6000_option_override_internal): If -mdense-math, issue an error if
-mno-mma or not -mcpu=future.
(rs6000_secondary_reload_memory): Add support for DMF registers.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.
(print_operand): Make %A handle both FPRs and DMRs.
(rs6000_dmf_register_move_cost): New helper function.
(rs6000_register_move_cost): Add support for DMR registers.
(rs6000_memory_move_cost): Likewise.
(rs6000_compute_pressure_classes): Likewise.
(rs6000_debugger_regno): Likewise.
(rs6000_opt_masks): Add -mdense-math support.
(rs6000_split_multireg_move): Add support for DMRs.
* config/rs6000/rs6000.h (TARGET_MMA_NO_DENSE_MATH): New macro.
(UNITS_PER_DMF_WORD): Likewise.
(FIRST_PSEUDO_REGISTER): Update for DMRs.
(FIXED_REGISTERS): Add DMRs.
(CALL_REALLY_USED_REGISTERS): Likewise.
(REG_ALLOC_ORDER): Likewise.
(DMF_REGNO_P): New macro.
(enum reg_class): Add DM_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD.
(REGISTER_NAMES): Add DMF registers.
(ADDITIONAL_REGISTER_NAMES): Likewise.
* config/rs6000/rs6000.md (FIRST_DMF_REGNO): New constant.
(LAST_DMF_REGNO): Likewise.
* config/rs6000/rs6000.opt (-mdense-math): New option.
---
gcc/config/rs6000/mma.md | 74 +++++++--
gcc/config/rs6000/predicates.md | 21 ++-
gcc/config/rs6000/rs6000-builtin.cc | 5 +-
gcc/config/rs6000/rs6000-c.cc | 9 +-
gcc/config/rs6000/rs6000-cpus.def | 2 +
gcc/config/rs6000/rs6000.cc | 231 +++++++++++++++++++++++-----
gcc/config/rs6000/rs6000.h | 40 ++++-
gcc/config/rs6000/rs6000.md | 2 +
gcc/config/rs6000/rs6000.opt | 4 +
9 files changed, 325 insertions(+), 63 deletions(-)
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 9f866361376..3f5852ca2bb 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -90,6 +90,7 @@ (define_c_enum "unspec"
UNSPEC_MMA_XVI8GER4SPP
UNSPEC_MMA_XXMFACC
UNSPEC_MMA_XXMTACC
+ UNSPEC_MMA_DMSETDMRZ
])
(define_c_enum "unspecv"
@@ -313,7 +314,9 @@ (define_insn_and_split "*movoo"
(set_attr "length" "*,*,8")])
-;; Vector quad support. XOmode can only live in FPRs.
+;; Vector quad support. Under the original MMA, XOmode can only live in VSX
+;; registers 0..31. With dense math, XOmode can live in either VSX registers
+;; (0..63) or DMF registers.
(define_expand "movxo"
[(set (match_operand:XO 0 "nonimmediate_operand")
(match_operand:XO 1 "input_operand"))]
@@ -338,10 +341,10 @@ (define_expand "movxo"
gcc_assert (false);
})
-(define_insn_and_split "*movxo"
+(define_insn_and_split "*movxo_nodm"
[(set (match_operand:XO 0 "nonimmediate_operand" "=d,ZwO,d")
(match_operand:XO 1 "input_operand" "ZwO,d,d"))]
- "TARGET_MMA
+ "TARGET_MMA_NO_DENSE_MATH
&& (gpc_reg_operand (operands[0], XOmode)
|| gpc_reg_operand (operands[1], XOmode))"
"@
@@ -358,6 +361,31 @@ (define_insn_and_split "*movxo"
(set_attr "length" "*,*,16")
(set_attr "max_prefixed_insns" "2,2,*")])
+(define_insn_and_split "*movxo_dm"
+ [(set (match_operand:XO 0 "nonimmediate_operand" "=wa,ZwO,wa,wD,wD,wa")
+ (match_operand:XO 1 "input_operand" "ZwO,wa, wa,wa,wD,wD"))]
+ "TARGET_DENSE_MATH
+ && (gpc_reg_operand (operands[0], XOmode)
+ || gpc_reg_operand (operands[1], XOmode))"
+ "@
+ #
+ #
+ #
+ dmxxinstdmr512 %0,%1,%Y1,0
+ dmmr %0,%1
+ dmxxextfdmr512 %0,%Y0,%1,0"
+ "&& reload_completed
+ && !dmf_operand (operands[0], XOmode)
+ && !dmf_operand (operands[1], XOmode)"
+ [(const_int 0)]
+{
+ rs6000_split_multireg_move (operands[0], operands[1]);
+ DONE;
+}
+ [(set_attr "type" "vecload,vecstore,veclogical,mma,mma,mma")
+ (set_attr "length" "*,*,16,*,*,*")
+ (set_attr "max_prefixed_insns" "2,2,*,*,*,*")])
+
(define_expand "vsx_assemble_pair"
[(match_operand:OO 0 "vsx_register_operand")
(match_operand:V16QI 1 "mma_assemble_input_operand")
@@ -456,29 +484,53 @@ (define_expand "mma_disassemble_acc"
DONE;
})
-;; MMA instructions that do not use their accumulators as an input, still
-;; must not allow their vector operands to overlap the registers used by
-;; the accumulator. We enforce this by marking the output as early clobber.
+;; MMA instructions that do not use their accumulators as an input, still must
+;; not allow their vector operands to overlap the registers used by the
+;; accumulator. We enforce this by marking the output as early clobber. The
+;; prime and de-prime instructions are not needed on systems with dense math
+;; registers.
(define_insn "mma_<acc>"
[(set (match_operand:XO 0 "accumulator_operand" "=&wD")
- (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0")]
+ (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
MMA_ACC))]
- "TARGET_MMA"
+ "TARGET_MMA_NO_DENSE_MATH"
"<acc> %A0"
[(set_attr "type" "mma")])
;; We can't have integer constants in XOmode so we wrap this in an
-;; UNSPEC_VOLATILE.
+;; UNSPEC_VOLATILE. If we have dense math registers, we can just use a normal
+;; UNSPEC instead of UNSPEC_VOLATILE.
-(define_insn "mma_xxsetaccz"
- [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+(define_expand "mma_xxsetaccz"
+ [(set (match_operand:XO 0 "accumulator_operand")
(unspec_volatile:XO [(const_int 0)]
UNSPECV_MMA_XXSETACCZ))]
"TARGET_MMA"
+{
+ if (TARGET_DENSE_MATH)
+ {
+ emit_insn (gen_mma_dmsetdmrz (operands[0]));
+ DONE;
+ }
+})
+
+(define_insn "*mma_xxsetaccz"
+ [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+ (unspec_volatile:XO [(const_int 0)]
+ UNSPECV_MMA_XXSETACCZ))]
+ "TARGET_MMA_NO_DENSE_MATH"
"xxsetaccz %A0"
[(set_attr "type" "mma")])
+(define_insn "mma_dmsetdmrz"
+ [(set (match_operand:XO 0 "accumulator_operand" "=wD")
+ (unspec [(const_int 0)]
+ UNSPEC_MMA_DMSETDMRZ))]
+ "TARGET_DENSE_MATH"
+ "dmsetdmrz %A0"
+ [(set_attr "type" "mma")])
+
(define_insn "mma_<vv>"
[(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 9f152037222..f1e03ec30c9 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -186,8 +186,23 @@ (define_predicate "vlogical_operand"
return VLOGICAL_REGNO_P (REGNO (op));
})
+;; Return 1 if op is a DMF register
+(define_predicate "dmf_operand"
+ (match_operand 0 "register_operand")
+{
+ if (!REG_P (op))
+ return 0;
+
+ if (!HARD_REGISTER_P (op))
+ return 1;
+
+ return DMF_REGNO_P (REGNO (op));
+})
+
;; Return 1 if op is an accumulator. On power10 systems, the accumulators
-;; overlap with the FPRs.
+;; overlap with the FPRs, while on systems with dense math, the accumulators
+;; are separate dense math registers and do not overlap with the FPR
+;; registers..
(define_predicate "accumulator_operand"
(match_operand 0 "register_operand")
{
@@ -198,7 +213,9 @@ (define_predicate "accumulator_operand"
return 1;
int r = REGNO (op);
- return FP_REGNO_P (r) && (r & 3) == 0;
+ return (TARGET_DENSE_MATH
+ ? DMF_REGNO_P (r)
+ : FP_REGNO_P (r) && (r & 3) == 0);
})
;; Return 1 if op is the carry register.
diff --git a/gcc/config/rs6000/rs6000-builtin.cc
b/gcc/config/rs6000/rs6000-builtin.cc
index bc1580f051b..6b7e5686f0c 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -1125,8 +1125,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi,
}
/* If we're disassembling an accumulator into a different type, we need
- to emit a xxmfacc instruction now, since we cannot do it later. */
- if (fncode == RS6000_BIF_DISASSEMBLE_ACC)
+ to emit a xxmfacc instruction now, since we cannot do it later. If we
+ have dense math registers, we don't need to do this. */
+ if (fncode == RS6000_BIF_DISASSEMBLE_ACC && !TARGET_DENSE_MATH)
{
new_decl = rs6000_builtin_decls[RS6000_BIF_XXMFACC_INTERNAL];
new_call = gimple_build_call (new_decl, 1, src);
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 6757a2477ad..e202fd6c7df 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -587,9 +587,14 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT
flags)
if (rs6000_cpu == PROCESSOR_CELL)
rs6000_define_or_undefine_macro (define_p, "__PPU__");
- /* Tell the user if we support the MMA instructions. */
+ /* Tell the user if we support the MMA instructions. Also tell them if MMA
+ uses the dense math registers. */
if ((flags & OPTION_MASK_MMA) != 0)
- rs6000_define_or_undefine_macro (define_p, "__MMA__");
+ {
+ rs6000_define_or_undefine_macro (define_p, "__MMA__");
+ if ((flags & OPTION_MASK_DENSE_MATH) != 0)
+ rs6000_define_or_undefine_macro (define_p, "__DENSE_MATH__");
+ }
/* Whether pc-relative code is being generated. */
if ((flags & OPTION_MASK_PCREL) != 0)
rs6000_define_or_undefine_macro (define_p, "__PCREL__");
diff --git a/gcc/config/rs6000/rs6000-cpus.def
b/gcc/config/rs6000/rs6000-cpus.def
index a0e6745495d..c03b069b779 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -91,6 +91,7 @@
will be fixed in potential future machines. */
#define FUTURE_MASKS_SERVER (POWER11_MASKS_SERVER \
| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR \
+ | OPTION_MASK_DENSE_MATH \
| OPTION_MASK_FUTURE)
/* Flags that need to be turned off if -mno-vsx. */
@@ -124,6 +125,7 @@
| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR \
| OPTION_MASK_CMPB \
| OPTION_MASK_CRYPTO \
+ | OPTION_MASK_DENSE_MATH \
| OPTION_MASK_DFP \
| OPTION_MASK_DLMZB \
| OPTION_MASK_EFFICIENT_UNALIGNED_VSX \
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index ac95ea05657..570e8a14f2d 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -292,7 +292,8 @@ enum rs6000_reg_type {
ALTIVEC_REG_TYPE,
FPR_REG_TYPE,
SPR_REG_TYPE,
- CR_REG_TYPE
+ CR_REG_TYPE,
+ DMF_REG_TYPE
};
/* Map register class to register type. */
@@ -306,22 +307,23 @@ static enum rs6000_reg_type
reg_class_to_reg_type[N_REG_CLASSES];
/* Register classes we care about in secondary reload or go if legitimate
- address. We only need to worry about GPR, FPR, and Altivec registers here,
- along an ANY field that is the OR of the 3 register classes. */
+ address. We only need to worry about GPR, FPR, Altivec, and DMF registers
+ here, along an ANY field that is the OR of the 4 register classes. */
enum rs6000_reload_reg_type {
RELOAD_REG_GPR, /* General purpose registers. */
RELOAD_REG_FPR, /* Traditional floating point regs. */
RELOAD_REG_VMX, /* Altivec (VMX) registers. */
- RELOAD_REG_ANY, /* OR of GPR, FPR, Altivec masks. */
+ RELOAD_REG_DMF, /* DMF registers. */
+ RELOAD_REG_ANY, /* OR of GPR/FPR/VMX/DMF masks. */
N_RELOAD_REG
};
-/* For setting up register classes, loop through the 3 register classes mapping
+/* For setting up register classes, loop through the 4 register classes mapping
into real registers, and skip the ANY class, which is just an OR of the
bits. */
#define FIRST_RELOAD_REG_CLASS RELOAD_REG_GPR
-#define LAST_RELOAD_REG_CLASS RELOAD_REG_VMX
+#define LAST_RELOAD_REG_CLASS RELOAD_REG_DMF
/* Map reload register type to a register in the register class. */
struct reload_reg_map_type {
@@ -333,6 +335,7 @@ static const struct reload_reg_map_type
reload_reg_map[N_RELOAD_REG] = {
{ "Gpr", FIRST_GPR_REGNO }, /* RELOAD_REG_GPR. */
{ "Fpr", FIRST_FPR_REGNO }, /* RELOAD_REG_FPR. */
{ "VMX", FIRST_ALTIVEC_REGNO }, /* RELOAD_REG_VMX. */
+ { "DMF", FIRST_DMF_REGNO }, /* RELOAD_REG_DMF. */
{ "Any", -1 }, /* RELOAD_REG_ANY. */
};
@@ -1226,6 +1229,8 @@ char rs6000_reg_names[][8] =
"0", "1", "2", "3", "4", "5", "6", "7",
/* vrsave vscr sfp */
"vrsave", "vscr", "sfp",
+ /* DMFs */
+ "0", "1", "2", "3", "4", "5", "6", "7",
};
#ifdef TARGET_REGNAMES
@@ -1252,6 +1257,8 @@ static const char alt_reg_names[][8] =
"%cr0", "%cr1", "%cr2", "%cr3", "%cr4", "%cr5", "%cr6", "%cr7",
/* vrsave vscr sfp */
"vrsave", "vscr", "sfp",
+ /* DMFs */
+ "%dmr0", "%dmr1", "%dmr2", "%dmr3", "%dmr4", "%dmr5", "%dmr6", "%dmr7",
};
#endif
@@ -1842,6 +1849,9 @@ rs6000_hard_regno_nregs_internal (int regno, machine_mode
mode)
else if (ALTIVEC_REGNO_P (regno))
reg_size = UNITS_PER_ALTIVEC_WORD;
+ else if (DMF_REGNO_P (regno))
+ reg_size = UNITS_PER_DMF_WORD;
+
else
reg_size = UNITS_PER_WORD;
@@ -1863,9 +1873,35 @@ rs6000_hard_regno_mode_ok_uncached (int regno,
machine_mode mode)
if (mode == OOmode)
return (TARGET_MMA && VSX_REGNO_P (regno) && (regno & 1) == 0);
- /* MMA accumulator modes need FPR registers divisible by 4. */
+ /* On ISA 3.1 (power10), MMA accumulator modes need FPR registers divisible
+ by 4.
+
+ If dense math registers are enabled, we can allow all VSX registers plus
+ the DMF registers. VSX registers are used to load and store the registers
+ as the accumulator registers do not have load and store instructions.
+ Because we just use the VSX registers for load/store operations, we just
+ need to make sure load vector pair and store vector pair instructions can
+ be used. */
if (mode == XOmode)
- return (TARGET_MMA && FP_REGNO_P (regno) && (regno & 3) == 0);
+ {
+ if (!TARGET_MMA)
+ return 0;
+
+ else if (!TARGET_DENSE_MATH)
+ return (FP_REGNO_P (regno) && (regno & 3) == 0);
+
+ else if (DMF_REGNO_P (regno))
+ return 1;
+
+ else
+ return (VSX_REGNO_P (regno)
+ && VSX_REGNO_P (last_regno)
+ && (regno & 1) == 0);
+ }
+
+ /* No other types other than XOmode can go in DMFs. */
+ if (DMF_REGNO_P (regno))
+ return 0;
/* PTImode can only go in GPRs. Quad word memory operations require even/odd
register combinations, and use PTImode where we need to deal with quad
@@ -2308,6 +2344,7 @@ rs6000_debug_reg_global (void)
rs6000_debug_reg_print (FIRST_ALTIVEC_REGNO,
LAST_ALTIVEC_REGNO,
"vs");
+ rs6000_debug_reg_print (FIRST_DMF_REGNO, LAST_DMF_REGNO, "dmf");
rs6000_debug_reg_print (LR_REGNO, LR_REGNO, "lr");
rs6000_debug_reg_print (CTR_REGNO, CTR_REGNO, "ctr");
rs6000_debug_reg_print (CR0_REGNO, CR7_REGNO, "cr");
@@ -2634,6 +2671,21 @@ rs6000_setup_reg_addr_masks (void)
addr_mask = 0;
reg = reload_reg_map[rc].reg;
+ /* Special case DMF registers. */
+ if (rc == RELOAD_REG_DMF)
+ {
+ if (TARGET_DENSE_MATH && m2 == XOmode)
+ {
+ addr_mask = RELOAD_REG_VALID;
+ reg_addr[m].addr_mask[rc] = addr_mask;
+ any_addr_mask |= addr_mask;
+ }
+ else
+ reg_addr[m].addr_mask[rc] = 0;
+
+ continue;
+ }
+
/* Can mode values go in the GPR/FPR/Altivec registers? */
if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg])
{
@@ -2784,6 +2836,9 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
for (r = CR1_REGNO; r <= CR7_REGNO; ++r)
rs6000_regno_regclass[r] = CR_REGS;
+ for (r = FIRST_DMF_REGNO; r <= LAST_DMF_REGNO; ++r)
+ rs6000_regno_regclass[r] = DM_REGS;
+
rs6000_regno_regclass[LR_REGNO] = LINK_REGS;
rs6000_regno_regclass[CTR_REGNO] = CTR_REGS;
rs6000_regno_regclass[CA_REGNO] = NO_REGS;
@@ -2808,6 +2863,7 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
reg_class_to_reg_type[(int)LINK_OR_CTR_REGS] = SPR_REG_TYPE;
reg_class_to_reg_type[(int)CR_REGS] = CR_REG_TYPE;
reg_class_to_reg_type[(int)CR0_REGS] = CR_REG_TYPE;
+ reg_class_to_reg_type[(int)DM_REGS] = DMF_REG_TYPE;
if (TARGET_VSX)
{
@@ -2994,8 +3050,11 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
if (TARGET_DIRECT_MOVE_128)
rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
+ /* Support for the accumulator registers, either FPR registers (aka original
+ mma) or DMF registers (dense math). */
if (TARGET_MMA)
- rs6000_constraints[RS6000_CONSTRAINT_wD] = FLOAT_REGS;
+ rs6000_constraints[RS6000_CONSTRAINT_wD]
+ = TARGET_DENSE_MATH ? DM_REGS : FLOAT_REGS;
/* Set up the reload helper and direct move functions. */
if (TARGET_VSX || TARGET_ALTIVEC)
@@ -4410,6 +4469,16 @@ rs6000_option_override_internal (bool global_init_p)
if (!TARGET_PCREL && TARGET_PCREL_OPT)
rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
+ /* Turn off dense math MMA+ options on non-future systems. */
+ if (TARGET_DENSE_MATH && (!TARGET_MMA || !TARGET_FUTURE))
+ {
+ if ((rs6000_isa_flags_explicit & OPTION_MASK_DENSE_MATH) != 0)
+ error ("%qs requires %qs", "-mdense-math",
+ (!TARGET_FUTURE ? "-mcpu=future" : "-mma"));
+
+ rs6000_isa_flags &= ~OPTION_MASK_DENSE_MATH;
+ }
+
if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
@@ -12356,6 +12425,11 @@ rs6000_secondary_reload_memory (rtx addr,
addr_mask = (reg_addr[mode].addr_mask[RELOAD_REG_VMX]
& ~RELOAD_REG_AND_M16);
+ /* DMF registers use VSX registers for memory operations, and need to
+ generate some extra instructions. */
+ else if (rclass == DM_REGS)
+ return 2;
+
/* If the register allocator hasn't made up its mind yet on the register
class to use, settle on defaults to use. */
else if (rclass == NO_REGS)
@@ -12684,6 +12758,13 @@ rs6000_secondary_reload_simple_move (enum
rs6000_reg_type to_type,
|| (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE)))
return true;
+ /* We can transfer between VSX registers and DMF registers without needing
+ extra registers. */
+ if (TARGET_DENSE_MATH && mode == XOmode
+ && ((to_type == DMF_REG_TYPE && from_type == VSX_REG_TYPE)
+ || (to_type == VSX_REG_TYPE && from_type == DMF_REG_TYPE)))
+ return true;
+
return false;
}
@@ -13378,6 +13459,10 @@ rs6000_preferred_reload_class (rtx x, enum reg_class
rclass)
machine_mode mode = GET_MODE (x);
bool is_constant = CONSTANT_P (x);
+ /* DMF registers can't be loaded or stored. */
+ if (rclass == DM_REGS)
+ return NO_REGS;
+
/* If a mode can't go in FPR/ALTIVEC/VSX registers, don't return a preferred
reload class for it. */
if ((rclass == ALTIVEC_REGS || rclass == VSX_REGS)
@@ -13474,7 +13559,7 @@ rs6000_preferred_reload_class (rtx x, enum reg_class
rclass)
return VSX_REGS;
if (mode == XOmode)
- return FLOAT_REGS;
+ return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS;
if (GET_MODE_CLASS (mode) == MODE_INT)
return GENERAL_REGS;
@@ -13599,6 +13684,11 @@ rs6000_secondary_reload_class (enum reg_class rclass,
machine_mode mode,
else
regno = -1;
+ /* DMF registers don't have loads or stores. We have to go through the VSX
+ registers to load XOmode (vector quad). */
+ if (TARGET_DENSE_MATH && rclass == DM_REGS)
+ return VSX_REGS;
+
/* If we have VSX register moves, prefer moving scalar values between
Altivec registers and GPR by going via an FPR (and then via memory)
instead of reloading the secondary memory address for Altivec moves. */
@@ -14130,8 +14220,19 @@ print_operand (FILE *file, rtx x, int code)
output_operand. */
case 'A':
- /* Write the MMA accumulator number associated with VSX register X. */
- if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
+ /* Write the MMA accumulator number associated with VSX register X. On
+ dense math systems, only allow DMF accumulators, not accumulators
+ overlapping with the FPR registers. */
+ if (!REG_P (x))
+ output_operand_lossage ("invalid %%A value");
+ else if (TARGET_DENSE_MATH)
+ {
+ if (DMF_REGNO_P (REGNO (x)))
+ fprintf (file, "%d", REGNO (x) - FIRST_DMF_REGNO);
+ else
+ output_operand_lossage ("%%A operand is not a DMF");
+ }
+ else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
output_operand_lossage ("invalid %%A value");
else
fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4);
@@ -22751,6 +22852,31 @@ rs6000_debug_address_cost (rtx x, machine_mode mode,
}
+/* Subroutine to determine the move cost of dense math registers. If we are
+ moving to/from VSX_REGISTER registers, the cost is either 1 move (for
+ 512-bit accumulators) or 2 moves (for 1,024 dmf registers). If we are
+ moving to anything else like GPR registers, make the cost very high. */
+
+static int
+rs6000_dmf_register_move_cost (machine_mode mode, reg_class_t rclass)
+{
+ const int reg_move_base = 2;
+ HARD_REG_SET vsx_set = (reg_class_contents[rclass]
+ & reg_class_contents[VSX_REGS]);
+
+ if (TARGET_DENSE_MATH && !hard_reg_set_empty_p (vsx_set))
+ {
+ /* __vector_quad (i.e. XOmode) is tranfered in 1 instruction. */
+ if (mode == XOmode)
+ return reg_move_base;
+
+ else
+ return reg_move_base * 2 * hard_regno_nregs (FIRST_DMF_REGNO, mode);
+ }
+
+ return 1000 * 2 * hard_regno_nregs (FIRST_DMF_REGNO, mode);
+}
+
/* A C expression returning the cost of moving data from a register of class
CLASS1 to one of CLASS2. */
@@ -22764,17 +22890,28 @@ rs6000_register_move_cost (machine_mode mode,
if (TARGET_DEBUG_COST)
dbg_cost_ctrl++;
+ HARD_REG_SET to_vsx, from_vsx;
+ to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS];
+ from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS];
+
+ /* Special case DMF registers, that can only move to/from VSX registers. */
+ if (from == DM_REGS && to == DM_REGS)
+ ret = 2 * hard_regno_nregs (FIRST_DMF_REGNO, mode);
+
+ else if (from == DM_REGS)
+ ret = rs6000_dmf_register_move_cost (mode, to);
+
+ else if (to == DM_REGS)
+ ret = rs6000_dmf_register_move_cost (mode, from);
+
/* If we have VSX, we can easily move between FPR or Altivec registers,
otherwise we can only easily move within classes.
Do this first so we give best-case answers for union classes
containing both gprs and vsx regs. */
- HARD_REG_SET to_vsx, from_vsx;
- to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS];
- from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS];
- if (!hard_reg_set_empty_p (to_vsx)
- && !hard_reg_set_empty_p (from_vsx)
- && (TARGET_VSX
- || hard_reg_set_intersect_p (to_vsx, from_vsx)))
+ else if (!hard_reg_set_empty_p (to_vsx)
+ && !hard_reg_set_empty_p (from_vsx)
+ && (TARGET_VSX
+ || hard_reg_set_intersect_p (to_vsx, from_vsx)))
{
int reg = FIRST_FPR_REGNO;
if (TARGET_VSX
@@ -22870,6 +23007,9 @@ rs6000_memory_move_cost (machine_mode mode, reg_class_t
rclass,
ret = 4 * hard_regno_nregs (32, mode);
else if (reg_classes_intersect_p (rclass, ALTIVEC_REGS))
ret = 4 * hard_regno_nregs (FIRST_ALTIVEC_REGNO, mode);
+ else if (reg_classes_intersect_p (rclass, DM_REGS))
+ ret = (rs6000_dmf_register_move_cost (mode, VSX_REGS)
+ + rs6000_memory_move_cost (mode, VSX_REGS, false));
else
ret = 4 + rs6000_register_move_cost (mode, rclass, GENERAL_REGS);
@@ -24078,6 +24218,8 @@ rs6000_compute_pressure_classes (enum reg_class
*pressure_classes)
if (TARGET_HARD_FLOAT)
pressure_classes[n++] = FLOAT_REGS;
}
+ if (TARGET_DENSE_MATH)
+ pressure_classes[n++] = DM_REGS;
pressure_classes[n++] = CR_REGS;
pressure_classes[n++] = SPECIAL_REGS;
@@ -24242,6 +24384,10 @@ rs6000_debugger_regno (unsigned int regno, unsigned
int format)
return 67;
if (regno == 64)
return 64;
+ /* XXX: This is a guess. The GCC register number for FIRST_DMF_REGNO is 111,
+ but the frame pointer regnum uses that. */
+ if (DMF_REGNO_P (regno))
+ return regno - FIRST_DMF_REGNO + 112;
gcc_unreachable ();
}
@@ -24463,6 +24609,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
false, true },
{ "cmpb", OPTION_MASK_CMPB, false, true },
{ "crypto", OPTION_MASK_CRYPTO, false, true },
+ { "dense-math", OPTION_MASK_DENSE_MATH, false, true },
{ "direct-move", 0, false, true },
{ "dlmzb", OPTION_MASK_DLMZB, false, true },
{ "efficient-unaligned-vsx", OPTION_MASK_EFFICIENT_UNALIGNED_VSX,
@@ -27480,9 +27627,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
unsigned offset = 0;
unsigned size = GET_MODE_SIZE (reg_mode);
- /* If we are reading an accumulator register, we have to
- deprime it before we can access it. */
- if (TARGET_MMA
+ /* If we are reading an accumulator register, we have to deprime it
+ before we can access it unless we have dense math registers. */
+ if (TARGET_MMA_NO_DENSE_MATH
&& GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
emit_insn (gen_mma_xxmfacc (src, src));
@@ -27514,9 +27661,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
emit_insn (gen_rtx_SET (dst2, src2));
}
- /* If we are writing an accumulator register, we have to
- prime it after we've written it. */
- if (TARGET_MMA
+ /* If we are writing an accumulator register, we have to prime it
+ after we've written it unless we have dense math registers. */
+ if (TARGET_MMA_NO_DENSE_MATH
&& GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
emit_insn (gen_mma_xxmtacc (dst, dst));
@@ -27530,7 +27677,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
|| XINT (src, 1) == UNSPECV_MMA_ASSEMBLE);
gcc_assert (REG_P (dst));
if (GET_MODE (src) == XOmode)
- gcc_assert (FP_REGNO_P (REGNO (dst)));
+ gcc_assert ((TARGET_DENSE_MATH
+ ? VSX_REGNO_P (REGNO (dst))
+ : FP_REGNO_P (REGNO (dst))));
if (GET_MODE (src) == OOmode)
gcc_assert (VSX_REGNO_P (REGNO (dst)));
@@ -27583,9 +27732,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
emit_insn (gen_rtx_SET (dst_i, op));
}
- /* We are writing an accumulator register, so we have to
- prime it after we've written it. */
- if (GET_MODE (src) == XOmode)
+ /* We are writing an accumulator register, so we have to prime it
+ after we've written it unless we have dense math registers. */
+ if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH)
emit_insn (gen_mma_xxmtacc (dst, dst));
return;
@@ -27596,9 +27745,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
{
- /* If we are reading an accumulator register, we have to
- deprime it before we can access it. */
- if (TARGET_MMA
+ /* If we are reading an accumulator register, we have to deprime it
+ before we can access it unless we have dense math registers. */
+ if (TARGET_MMA_NO_DENSE_MATH
&& GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
emit_insn (gen_mma_xxmfacc (src, src));
@@ -27624,9 +27773,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
i * reg_mode_size)));
}
- /* If we are writing an accumulator register, we have to
- prime it after we've written it. */
- if (TARGET_MMA
+ /* If we are writing an accumulator register, we have to prime it after
+ we've written it unless we have dense math registers. */
+ if (TARGET_MMA_NO_DENSE_MATH
&& GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
emit_insn (gen_mma_xxmtacc (dst, dst));
}
@@ -27761,9 +27910,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true));
}
- /* If we are reading an accumulator register, we have to
- deprime it before we can access it. */
- if (TARGET_MMA && REG_P (src)
+ /* If we are reading an accumulator register, we have to deprime it
+ before we can access it unless we have dense math registers. */
+ if (TARGET_MMA_NO_DENSE_MATH && REG_P (src)
&& GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
emit_insn (gen_mma_xxmfacc (src, src));
@@ -27793,9 +27942,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
j * reg_mode_size)));
}
- /* If we are writing an accumulator register, we have to
- prime it after we've written it. */
- if (TARGET_MMA && REG_P (dst)
+ /* If we are writing an accumulator register, we have to prime it after
+ we've written it unless we have dense math registers. */
+ if (TARGET_MMA_NO_DENSE_MATH && REG_P (dst)
&& GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
emit_insn (gen_mma_xxmtacc (dst, dst));
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index d1f953630f7..169d81e208e 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -556,6 +556,9 @@ extern int rs6000_vector_align[];
#define TARGET_DIRECT_MOVE_64BIT (TARGET_DIRECT_MOVE \
&& TARGET_POWERPC64)
+/* Whether we have MMA support without dense math support. */
+#define TARGET_MMA_NO_DENSE_MATH (TARGET_MMA && !TARGET_DENSE_MATH)
+
/* Inlining allows targets to define the meanings of bits in target_info
field of ipa_fn_summary by itself, the used bits for rs6000 are listed
below. */
@@ -653,6 +656,7 @@ extern unsigned char rs6000_recip_bits[];
#define UNITS_PER_FP_WORD 8
#define UNITS_PER_ALTIVEC_WORD 16
#define UNITS_PER_VSX_WORD 16
+#define UNITS_PER_DMF_WORD 128
/* Type used for ptrdiff_t, as a string used in a declaration. */
#define PTRDIFF_TYPE "int"
@@ -766,7 +770,7 @@ enum data_align { align_abi, align_opt, align_both };
Another pseudo (not included in DWARF_FRAME_REGISTERS) is soft frame
pointer, which is eventually eliminated in favor of SP or FP. */
-#define FIRST_PSEUDO_REGISTER 111
+#define FIRST_PSEUDO_REGISTER 119
/* Use standard DWARF numbering for DWARF debugging information. */
#define DEBUGGER_REGNO(REGNO) rs6000_debugger_regno ((REGNO), 0)
@@ -803,7 +807,9 @@ enum data_align { align_abi, align_opt, align_both };
/* cr0..cr7 */ \
0, 0, 0, 0, 0, 0, 0, 0, \
/* vrsave vscr sfp */ \
- 1, 1, 1 \
+ 1, 1, 1, \
+ /* DMF registers. */ \
+ 0, 0, 0, 0, 0, 0, 0, 0 \
}
/* Like `CALL_USED_REGISTERS' except this macro doesn't require that
@@ -827,7 +833,9 @@ enum data_align { align_abi, align_opt, align_both };
/* cr0..cr7 */ \
1, 1, 0, 0, 0, 1, 1, 1, \
/* vrsave vscr sfp */ \
- 0, 0, 0 \
+ 0, 0, 0, \
+ /* DMF registers. */ \
+ 0, 0, 0, 0, 0, 0, 0, 0 \
}
#define TOTAL_ALTIVEC_REGS (LAST_ALTIVEC_REGNO - FIRST_ALTIVEC_REGNO + 1)
@@ -864,6 +872,7 @@ enum data_align { align_abi, align_opt, align_both };
v2 (not saved; incoming vector arg reg; return value)
v19 - v14 (not saved or used for anything)
v31 - v20 (saved; order given to save least number)
+ dmr0 - dmr7 (not saved)
vrsave, vscr (fixed)
sfp (fixed)
*/
@@ -906,6 +915,9 @@ enum data_align { align_abi, align_opt, align_both };
66, \
83, 82, 81, 80, 79, 78, \
95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, \
+ /* DMF registers. */ \
+ 111, 112, 113, 114, 115, 116, 117, 118, \
+ /* Vrsave, vscr, sfp. */ \
108, 109, \
110 \
}
@@ -932,6 +944,9 @@ enum data_align { align_abi, align_opt, align_both };
/* True if register is a VSX register. */
#define VSX_REGNO_P(N) (FP_REGNO_P (N) || ALTIVEC_REGNO_P (N))
+/* True if register is a DMF register. */
+#define DMF_REGNO_P(N) ((N) >= FIRST_DMF_REGNO && (N) <= LAST_DMF_REGNO)
+
/* Alternate name for any vector register supporting floating point, no matter
which instruction set(s) are available. */
#define VFLOAT_REGNO_P(N) \
@@ -1069,6 +1084,7 @@ enum reg_class
FLOAT_REGS,
ALTIVEC_REGS,
VSX_REGS,
+ DM_REGS,
VRSAVE_REGS,
VSCR_REGS,
GEN_OR_FLOAT_REGS,
@@ -1098,6 +1114,7 @@ enum reg_class
"FLOAT_REGS",
\
"ALTIVEC_REGS", \
"VSX_REGS", \
+ "DM_REGS", \
"VRSAVE_REGS", \
"VSCR_REGS", \
"GEN_OR_FLOAT_REGS", \
@@ -1132,6 +1149,8 @@ enum reg_class
{ 0x00000000, 0x00000000, 0xffffffff, 0x00000000 }, \
/* VSX_REGS. */ \
{ 0x00000000, 0xffffffff, 0xffffffff, 0x00000000 }, \
+ /* DM_REGS. */ \
+ { 0x00000000, 0x00000000, 0x00000000, 0x007f8000 }, \
/* VRSAVE_REGS. */ \
{ 0x00000000, 0x00000000, 0x00000000, 0x00001000 }, \
/* VSCR_REGS. */ \
@@ -1159,7 +1178,7 @@ enum reg_class
/* CA_REGS. */ \
{ 0x00000000, 0x00000000, 0x00000000, 0x00000004 }, \
/* ALL_REGS. */ \
- { 0xffffffff, 0xffffffff, 0xffffffff, 0x00007fff } \
+ { 0xffffffff, 0xffffffff, 0xffffffff, 0x007fffff } \
}
/* The same information, inverted:
@@ -2060,7 +2079,16 @@ extern char rs6000_reg_names[][8]; /* register
names (0 vs. %r0). */
&rs6000_reg_names[108][0], /* vrsave */ \
&rs6000_reg_names[109][0], /* vscr */ \
\
- &rs6000_reg_names[110][0] /* sfp */ \
+ &rs6000_reg_names[110][0], /* sfp */ \
+ \
+ &rs6000_reg_names[111][0], /* dmr0 */ \
+ &rs6000_reg_names[112][0], /* dmr1 */ \
+ &rs6000_reg_names[113][0], /* dmr2 */ \
+ &rs6000_reg_names[114][0], /* dmr3 */ \
+ &rs6000_reg_names[115][0], /* dmr4 */ \
+ &rs6000_reg_names[116][0], /* dmr5 */ \
+ &rs6000_reg_names[117][0], /* dmr6 */ \
+ &rs6000_reg_names[118][0], /* dmr7 */ \
}
/* Table of additional register names to use in user input. */
@@ -2114,6 +2142,8 @@ extern char rs6000_reg_names[][8]; /* register
names (0 vs. %r0). */
{"vs52", 84}, {"vs53", 85}, {"vs54", 86}, {"vs55", 87}, \
{"vs56", 88}, {"vs57", 89}, {"vs58", 90}, {"vs59", 91}, \
{"vs60", 92}, {"vs61", 93}, {"vs62", 94}, {"vs63", 95}, \
+ {"dmr0", 111}, {"dmr1", 112}, {"dmr2", 113}, {"dmr3", 114}, \
+ {"dmr4", 115}, {"dmr5", 116}, {"dmr6", 117}, {"dmr7", 118}, \
}
/* This is how to output an element of a case-vector that is relative. */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ff085bf9bb1..0717e86e9d6 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -51,6 +51,8 @@ (define_constants
(VRSAVE_REGNO 108)
(VSCR_REGNO 109)
(FRAME_POINTER_REGNUM 110)
+ (FIRST_DMF_REGNO 111)
+ (LAST_DMF_REGNO 118)
])
;;
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 7c4f0375424..72578644037 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -639,6 +639,10 @@ mfuture
Target Undocumented Mask(FUTURE) Var(rs6000_isa_flags) Warn(Do not use
%<-mfuture>, use %<-mcpu=future>)
Generate (do not generate) potential future instructions.
+mdense_math
+Target Mask(DENSE_MATH) Var(rs6000_isa_flags)
+Generate (do not generate) dense math MMA+ instructions.
+
; Documented parameters
-param=rs6000-vect-unroll-limit=
--
2.51.1
--
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: [email protected]