https://gcc.gnu.org/g:2dc9c6f2061f9853336134979f64f7359ee272d8

commit 2dc9c6f2061f9853336134979f64f7359ee272d8
Author: Michael Meissner <[email protected]>
Date:   Wed Jul 1 10:47:34 2026 -0400

    Add dense math register support.
    
    This patch is a modification of the V6 patches that I sent out on April
    21st, 2026.
    
    In particular, I made the changes in relation to the comments posted in
    February that I didn't fully address previously.
    
    Here is comment from February:
    
      * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/708071.html
    
    Here is my reply:
    
      * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/715248.html
    
    Here are the V6 patches posted on April 21st, 2026:
    
      * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713352.html
      * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713353.html
      * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713354.html
      * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713356.html
      * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713357.html
    
    There are 7 patches in this patch set:
    
    Patch #1 adds the wD constraint and the accumulator_operand predicate.
    
    Patch #2 switches mma.md to use the wD constraint and accumulator_operand
    predicate.
    
    Patch #3 adds the -mdense-math option, but in this patch, -mdense-math is 
not
    implemented.
    
    Patch #4 adds support for 512-bit dense math registers.
    
    Patch #5 adds support for 1,024-bit dense math registers.
    
    Patch #6 is an optional patch that changes the name of the MMA instructions 
from
    the original name used in the power10/power11 time line to a new alternate 
name
    that has 'dm' (for dense math) in the instruction name.  Note, this is a new
    patch for the V7 patch set.
    
    Patch #7 clones the mma builtin tests to test the code generation of MMA
    instructions if -mcpu=future is used.  Note, this is a new patch for the V7
    patch set.  If patch #6 is not applied, this patch will need to be modified.
    
    The following is the description of dense math registes from previous 
versions
    of the patches.
    
    The Dense Math Facility (dmf) is designed to be an extension to the ISA
    3.1 (i.e. power10/power11) MMA facility.  Now, since these are future
    patches, the Dense Math Facility might appear in future PowerPC
    machines or maybe it won't be used in real hardware.
    
    One of the concepts of the DMF system is the accumulators used in the
    MMA and the DMF extensions will become separate registers, rather
    than being overlaid over the traditional floating point registers
    (i.e. VSX registers 0..31).
    
    In addition to being separate registers, the dense math accumulators
    are now logically 1,024 biits instead of 512.
    
    The way the Dense Math registers and instructions are designed,
    existing power10/power11 MMA instructions that operate on 512 bits will
    work with Dense Math.  In ISA 3.1, each of the 8 accumulators are
    overlaid over 4 adjacent FPR registers, and the compiler must not touch
    the 4 adjacent FPRs while the MMA accumulator is used.
    
    In the Dense Math system, the accumulator is a separate register.  When
    -mcpu=power11 or -mcpu=power10 is used, the GCC compiler will not
    allocate the appropriate FPR (VSX) reigsters when generating MMA
    instructions.
    
    If a function compiled for Power10/Power11 is run on a system with
    Dense Math support enabled, the effect is a bunch of the FPR registers
    will not be allocated because the compiler assumes the accumulaters are
    there.  After these patches are applied, if the user compiles the code
    with -mcpu=future, the compiler can allocate up to 32 more vector
    registers, because the Dense Math accumulators are separate registers.
    
    In fact two of the MMA tests (mma-double-test.c and mma-single-test.c)
    do about 20 less spills of floating point values to the stack, since
    the compiler can allocate those FPR vector registers for other
    purposes.
    
    These 5 patches will allow GCC to allocate these registers if the
    -mcpu=future option is used.
    
      1: The first patch adds a new constraint (%wD) that can be used by
         code generating MMA instructions. If the user used -mcpu=power10
         or -mcpu=power11, %wD will act like %d and insist the register be
         VSX registers 0..31.  If the user used -mcpu=future, the new
         separate dense math accumulators will be used.
    
      2: This patch just adds the -mdense-math option, but it does not add
         support for dense math registers until patch #3.
    
      3: This patch adds the support for the current MMA 512-bit
         instructions to use separate accumulators instead of overlaid VSX
         registers.
    
      4: This patch adds support for an extension to MMA where the
         accumulators grow to 1,024 bits instead of 512 bits.
    
      5: This patch is an optional patch that adds comments to the various
         MMA insn that explain what MMA instructions are generated by the
         particular insn.
    
    This patch is the foundation for the Dense Math support.  It is
    expected other patches may be added to this to support potential new
    features added to the Dense Math Facility.
    
    I have built bootstrap little endian compilers on power10 systems, and
    big endian compiler on power9 systems.  There were no regression in the
    tests.  Can I add the patches to the GCC trunk after the -mcpu=future
    patch is applied and GCC 17 has opened up?
    
    2026-07-01  Michael Meissner  <[email protected]>
    
    gcc/
    
            * config/rs6000/constraints.md (wD): New constraint.
            * config/rs6000/predicates.md (accumulator_operand): New predicate.
            * config/rs6000/rs6000.cc (rs6000_debug_reg_global): Print the 
register
            class for the 'wD' constraint.
            (rs6000_init_hard_regno_mode_ok): Set up the 'wD' register 
constraint
            class.
            * config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add element 
for
            the 'wD' constraint.
            * doc/md.texi (PowerPC constraints): Document the 'wD' constraint.
    
    2026-07-01  Michael Meissner  <[email protected]>
    
    gcc/
    
            * config/rs6000/mma.md (mma_<vv>): Use the wD constraint and
            accumulator_operand predicate for all MMA instructions taking
            accumulator operands.
            (mma_<avv>): Likewise.
            (mma_<pv>"): Likewise.
            (mma_<apv>): Likewise.
            (mma_<vvi4i4i8>): Likewise.
            (mma_<avvi4i4i8>): Likewise.
            (mma_<vvi4i4i2>"): Likewise.
            (mma_<avvi4i4i2>): Likewise.
            (mma_<vvi4i4>): Likewise.
            (mma_<avvi4i4>): Likewise.
            (mma_<pvi4i2>): Likewise.
            (mma_<apvi4i2>): Likewise.
            (mma_<vvi4i4i4>): Likewise.
            (mma_<avvi4i4i4>): Likewise.
    
    2026-07-01   Michael Meissner  <[email protected]>
    
    gcc/
    
            * config/rs6000/rs6000-c.cc (rs6000_define_or_undefine_macro): 
Define
            __MMA_DENSE_MATH__ if we have MMA that uses dense math register
            accumulators.  Define __MMA_NO_DENSE_MATH__ if we have MMA but we 
are
            using ISA 3.1 where the accumulators are overlaid over VSX registers
            0..32.  Define __DENSE_MATH__ if we have dense math registers.
            * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do not
            allow -mdense-math unless -mcpu=future is used.
            (rs6000_opt_masks): Add -mdense-math support.
            * config/rs6000/rs6000.h (TARGET_MMA_DENSE_MATH): New macro.
            (TARGET_MMA_NO_DENSE_MATH): Likewise.
            * config/rs6000/rs6000.opt (-mdense-math): New option.
            * doc/invoke.texi (RS/6000 and PowerPC Options): Add -mdense-math.
    
    2026-07-01   Michael Meissner  <[email protected]>
    
    gcc/
    
            * config/rs6000/mma.md (movoo): Allow -mdense-math -mno-mma.
            (movxo): Convert to being a define_expand that can handle both the
            original MMA support without dense math registers, and adding dense 
math
            support.  Allow -mdense-math -mno-mma.
            (movxo_nodm): Rename original movxo insn, and restrict this insn to 
when
            we do not have dense math registers.
            (movxo_dm): New define_insn_and_split for dense math registers.
            (vsx_assemble_pair): Allow -mdense-math -mno-mma.
            (vsx_disassemble_pair): Likewise.
            (mma_assemble_acc): Likewise.
            (mma_disassemble_acc): Likewise.
            (mma_<acc>): Allow built-ins to be used if -mdense-math.
            (mma_xxsetaccz): Convert into a define_expand to handle both 
non-dense
            math and dense math registers.
            (mma_xxsetaccz_nodm): Rename from mma_xxsetaccz and limit code to 
non
            dense math systems.
            (mma_xxsetaccz_dm): New insn for direct math register support.
            * config/rs6000/predicates.md (dmf_register_operand): New predicate.
            (accumulator_operand): Add support for dense math registers.
            * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): 
Do
            not issue xxmfacc (deprime) instruction if we have dense math 
registers.
            * config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): Add 
-mdense-math.
            (POWERPC_MASKS): Likewise.
            * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add dense math
            register support.
            (enum rs6000_reload_reg_typ): Likewise.
            (LAST_RELOAD_REG_CLASS): Likewise.
            (reload_reg_map): Likewise.
            (rs6000_reg_names): Likewise.
            (alt_reg_names): Likewise.
            (rs6000_hard_regno_nregs_internal): Likewise.
            (rs6000_hard_regno_mode_ok_uncached): Likewise.
            (rs6000_debug_reg_global): Likewise.
            (rs6000_setup_reg_addr_masks): Likewise.
            (rs6000_init_hard_regno_mode_ok): Likewise.
            (rs6000_secondary_reload_memory): Likewise.
            (rs6000_secondary_reload_simple_move): Likewise.
            (rs6000_preferred_reload_class): Likewise.
            (rs6000_secondary_reload_class): Likewise.
            (print_operand): Likewise.
            (rs6000_dmf_register_move_cost): New helper function.
            (rs6000_register_move_cost): Add dense math register support.
            (rs6000_memory_move_cost): Likewise.
            (rs6000_compute_pressure_classes): Likewise.
            (rs6000_debugger_regno): Likewise.
            (rs6000_opt_masks): Likewise.
            (rs6000_split_multireg_move): Likewise.
            * config/rs6000/rs6000.h (UNITS_PER_DMF_WORD): New macro.
            (FIRST_PSEUDO_REGISTER): Add dense math register support.
            (FIXED_REGISTERS): Likewise.
            (CALL_REALLY_USED_REGISTERS): Likewise.
            (REG_ALLOC_ORDER): Likewise.
            (DMF_REGNO_P): New macro.
            (enum reg_class): Add dense math register support.
            (REG_CLASS_NAMES): Likewise.
            (REGISTER_NAMES): Likewise.
            (ADDITIONAL_REGISTER_NAMES): Likewise.
            * config/rs6000/rs6000.md (FIRST_DMF_REGNO): New constant.
            (LAST_DMF_REGNO): Likewise.
    
    2026-07-01   Michael Meissner  <[email protected]>
    
    gcc/
    
            * config/rs6000/mma.md (UNSPEC_DMF_INSERT512_UPPER): New unspec.
            (UNSPEC_DMF_INSERT512_LOWER): Likewise.
            (UNSPEC_DMF_EXTRACT512): Likewise.
            (UNSPEC_DMF_RELOAD_FROM_MEMORY): Likewise.
            (UNSPEC_DMF_RELOAD_TO_MEMORY): Likewise.
            (movtdo): New define_expand and define_insn_and_split to implement 
1,024
            bit dense math registers.
            (movtdo_insert512_upper): New insn.
            (movtdo_insert512_lower): Likewise.
            (movtdo_extract512): Likewise.
            (reload_tdo_from_memory): Likewise.
            (reload_tdo_to_memory): Likewise.
            * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add dense 
math
            register support.
            (rs6000_init_builtins): Add support for __dm1024 keyword.
            * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add 
support
            for TDOmode.
            (rs6000_function_arg): Likewise.
            * config/rs6000/rs6000-modes.def (TDOmode): New mode.
            * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Add
            support for TDOmode.
            (rs6000_hard_regno_mode_ok): Likewise.
            (rs6000_modes_tieable_p): Likewise.
            (rs6000_debug_reg_global): Likewise.
            (rs6000_setup_reg_addr_masks): Likewise.
            (rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup 
reload
            hooks for dense math TDO reload mode.
            (reg_offset_addressing_ok_p): Add support for TDOmode.
            (rs6000_emit_move): Likewise.
            (rs6000_secondary_reload_simple_move): Likewise.
            (rs6000_preferred_reload_class): Likewise.
            (rs6000_mangle_type): Add mangling for __dm1024 type.
            (rs6000_dmf_register_move_cost): Add support for TDOmode.
            (rs6000_split_multireg_move): Likewise.
            (rs6000_invalid_conversion): Likewise.
            * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
            (enum rs6000_builtin_type_index): Add dense math register type 
nodes.
            (dm1024_type_node): Likewise.
            (ptr_dm1024_type_node): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/dm-1024bit.c: New test.
    
    2026-07-01  Michael Meissner  <[email protected]>
    
    gcc/
    
            * config/rs6000/mma.md (vvi4i4i8): Eliminate using the 'pm' prefix 
here,
            so we can emit pmdm* on dense math systems.
            (avvi4i4i8): Likewise.
            (vvi4i4i2): Likewise.
            (avvi4i4i2): Likewise.
            (vvi4i4): Likewise.
            (avvi4i4): Likewise.
            (pvi4i2): Likewise.
            (apvi4i2): Likewise.
            (vvi4i4i4): Likewise.
            (mma_<vv>): If -mdesne-math, emit 'dmxv*' form of the instruction
            instead of 'xv*'.
            (mma_<avv>): Likewise.
            (mma_<pv>): Likewise.
            (mma_<apv>): Likewise.
            (mma_pm<vvi4i4i8>): If -mdense-math, emit 'pmdm*' instead of 'pm*'.
            (mma_pm<avvi4i4i8>): Likewise.
            (mma_pm<vvi4i4i2>): Likewise.
            (mma_pm<avvi4i4i2>): Likewise.
            (mma_pm<vvi4i4>): Likewise.
            (mma_pm<avvi4i4>): Likewise.
            (mma_pm<pvi4i2>): Likewise.
            (mma_pm<apvi4i2>): Likewise.
            (mma_pm<vvi4i4i4>): Likewise.
            (mma_pm<avvi4i4i4>): Likewise.
            * config/rs6000/rs6000.cc (print_operand): For %!, print 'dm' if
            -mdense-math.
            * config/rs6000/rs6000.h (PRINT_OPERAND_PUNCT_VALID_P): Allow %!.
    
    2026-07-01  Michael Meissner  <[email protected]>
    
    gcc/testsuite/
    
            * gcc.target/powerpc/dm-builtin-1.c: New test.
            * gcc.target/powerpc/dm-builtin-10-pair.c: Likewise.
            * gcc.target/powerpc/dm-builtin-10-quad.c: Likewise.
            * gcc.target/powerpc/dm-builtin-2.c: Likewise.
            * gcc.target/powerpc/dm-builtin-3.c: Likewise.
            * gcc.target/powerpc/dm-builtin-4.c: Likewise.
            * gcc.target/powerpc/dm-builtin-5.c: Likewise.
            * gcc.target/powerpc/dm-builtin-6.c: Likewise.
            * gcc.target/powerpc/dm-builtin-7.c: Likewise.
            * gcc.target/powerpc/dm-builtin-8.c: Likewise.
            * gcc.target/powerpc/dm-builtin-9.c: Likewise.

Diff:
---
 gcc/config/rs6000/rs6000.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 9a6c142354bb..07b6c690ad5f 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -2036,7 +2036,7 @@ rs6000_modes_tieable_p (machine_mode mode1, machine_mode 
mode2)
   if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
       || mode1 == TDOmode || mode2 == PTImode || mode2 == OOmode
       || mode2 == XOmode || mode2 == TDOmode
-      || FP16_SCALAR_MODE_P (mode1) || || FP16_SCALAR_MODE_P (mode2))
+      || FP16_SCALAR_MODE_P (mode1) || FP16_SCALAR_MODE_P (mode2))
     return mode1 == mode2;
 
   if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1))

Reply via email to