[gcc(refs/users/meissner/heads/work161-dmf)] Add support for dense math registers.
https://gcc.gnu.org/g:0811bcd7ff4d5a3c625b1cd82faedc6d34e8976b commit 0811bcd7ff4d5a3c625b1cd82faedc6d34e8976b Author: Michael Meissner Date: Tue Mar 5 15:00:44 2024 -0500 Add support for dense math registers. The MMA subsystem added the notion of accumulator registers as an optional feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with the VSX registers 0..31, but logically the accumulator registers were separate from the FPR registers. In ISA 3.1, it was anticipated that in future systems, the accumulator registers may no overlap with the FPR registers. This patch adds the support for dense math registers as separate registers. This particular patch does not change the MMA support to use the accumulators within the dense math registers. This patch just adds the basic support for having separate DMRs. The next patch will switch the MMA support to use the accumulators if -mcpu=future is used. For testing purposes, I added an undocumented option '-mdense-math' to enable or disable the dense math support. This patch adds a new constraint (wD). If MMA is selected but dense math is not selected (i.e. -mcpu=power10), the wD constraint will allow access to accumulators that overlap with VSX registers 0..31. If both MMA and dense math are selected (i.e. -mcpu=future), the wD constraint will only allow dense math registers. This patch modifies the existing %A output modifier. If MMA is selected but dense math is not selected, then %A output modifier converts the VSX register number to the accumulator number, by dividing it by 4. If both MMA and dense math are selected, then %A will map the separate DMR registers into 0..7. The intention is that user code using extended asm can be modified to run on both MMA without dense math and MMA with dense math: 1) If possible, don't use extended asm, but instead use the MMA built-in functions; 2) If you do need to write extended asm, change the d constraints targetting accumulators should now use wD; 3) Only use the built-in zero, assemble and disassemble functions create move data between vector quad types and dense math accumulators. I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the extended asm code. The reason is these instructions assume there is a 1-to-1 correspondence between 4 adjacent FPR registers and an accumulator that overlaps with those instructions. With accumulators now being separate registers, there no longer is a 1-to-1 correspondence. It is possible that the mangling for DMRs and the GDB register numbers may produce other changes in the future. 2024-03-05 Michael Meissner * config/rs6000/mma.md (movxo): Add comments about dense math registers. (movxo_nodm): Rename from movxo and restrict the usage to machines without dense math registers. (movxo_dm): New insn for movxo support for machines with dense math registers. (mma_): Restrict usage to machines without dense math registers. (mma_xxsetaccz): Make a define_expand, and add support for dense math registers. (mma_xxsetaccz_nodm): Rename from mma_xxsetaccz, and restrict to machines without dense math registers. (mma_dmsetaccz): New insn. * config/rs6000/predicates.md (dmr_operand): New predicate. (accumulator_operand): Add support for dense math registers. * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do not de-prime accumulator when disassembling a vector quad. * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE. (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR. (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD constraint. (reload_reg_map): Likewise. (rs6000_reg_names): Likewise. (alt_reg_names): Likewise. (rs6000_hard_regno_nregs_internal): Likewise. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Likewise. (rs6000_secondary_reload_memory): Add support for DMR registers. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_secondary_reload_class): Likewise. (print_operand): Make %A handle both FPRs and DMRs. (rs6000_dmr_register_move_cost): New helper function. (rs6000_register_move_cost): Add support for DMR registers.
[gcc(refs/users/meissner/heads/work161-dmf)] PowerPC: Switch to dense math names for all MMA operations.
https://gcc.gnu.org/g:d6a52b18452ee2d086094e78dde287a44a53f91a commit d6a52b18452ee2d086094e78dde287a44a53f91a Author: Michael Meissner Date: Tue Mar 5 15:33:12 2024 -0500 PowerPC: Switch to dense math names for all MMA operations. This patch changes the assembler instruction names for MMA instructions from the original name used in power10 to the new name when used with the dense math system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the same bits for either spelling. For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the instruction. However, the prefixed instructions have a 'pm' prefix, and we add the 'dm' prefix afterwards. To prevent having two sets of parallel int attributes, we remove the "pm" prefix from the instruction string in the attributes, and add it later, both in the insn name and in the output template. 2024-03-05 Michael Meissner gcc/ * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a "pm" prefix. (avvi4i4i8): Likewise. (vvi4i4i2): Likewise. (avvi4i4i2): Likewise. (vvi4i4): Likewise. (avvi4i4): Likewise. (pvi4i2): Likewise. (apvi4i2): Likewise. (vvi4i4i4): Likewise. (avvi4i4i4): Likewise. (mma_xxsetaccz): Add support for running on DMF systems, generating the dense math instruction and using the dense math accumulators. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_pm): Add support for running on DMF systems, generating the dense math instruction and using the dense math accumulators. Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm' prefixes based on whether we have the original MMA specification or if we have dense math support. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. Diff: --- gcc/config/rs6000/mma.md | 161 +++ 1 file changed, 107 insertions(+), 54 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 2ce613b46cc..f3870eac51a 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -224,44 +224,47 @@ (UNSPEC_MMA_XVF64GERNP "xvf64gernp") (UNSPEC_MMA_XVF64GERNN "xvf64gernn")]) -(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")]) +;; The "pm" prefix is not in these expansions, so that we can generate +;; pmdmxvi4ger8 on systems with dense math registers and xvi4ger8 on systems +;; without dense math registers. +(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "xvi4ger8")]) -(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "pmxvi4ger8pp")]) +(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "xvi4ger8pp")]) -(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2") -(UNSPEC_MMA_PMXVI16GER2S "pmxvi16ger2s") -(UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2") -(UNSPEC_MMA_PMXVBF16GER2 "pmxvbf16ger2")]) +(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2"xvi16ger2") +(UNSPEC_MMA_PMXVI16GER2S "xvi16ger2s") +(UNSPEC_MMA_PMXVF16GER2"xvf16ger2") +(UNSPEC_MMA_PMXVBF16GER2 "xvbf16ger2")]) -(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "pmxvi16ger2pp") -(UNSPEC_MMA_PMXVI16GER2SPP "pmxvi16ger2spp") -(UNSPEC_MMA_PMXVF16GER2PP "pmxvf16ger2pp") -(UNSPEC_MMA_PMXVF16GER2PN "pmxvf16ger2pn") -(UNSPEC_MMA_PMXVF16GER2NP "pmxvf16ger2np") -(UNSPEC_MMA_PMXVF16GER2NN "pmxvf16ger2nn") -(UNSPEC_MMA_PMXVBF16GER2PP "pmxvbf16ger2pp") -(UNSPEC_MMA_PMXVBF16GER2PN "pmxvbf16ger2pn") -(UNSPEC_MMA_PMXVBF16GER2NP "pmxvbf16ger2np") -(UNSPEC_MMA_PMXVBF16GER2NN "pmxvbf16ger2nn")]) +(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "xvi16ger2pp") +(UNSPEC_MMA_PMXVI16GER2SPP "xvi16ger2spp") +(UNSPEC_MMA_PMXVF16GER2PP "xvf16ger2pp") +(UNSPEC
[gcc(refs/users/meissner/heads/work161-dmf)] Add dense math test for new instruction names.
https://gcc.gnu.org/g:e34b274be7e9f4df3d08109ee0f68da4f4493b15 commit e34b274be7e9f4df3d08109ee0f68da4f4493b15 Author: Michael Meissner Date: Tue Mar 5 15:41:37 2024 -0500 Add dense math test for new instruction names. 2024-03-05 Michael Meissner gcc/testsuite/ * gcc.target/powerpc/dm-double-test.c: New test. * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New target test. Diff: --- gcc/testsuite/gcc.target/powerpc/dm-double-test.c | 194 ++ gcc/testsuite/lib/target-supports.exp | 23 +++ 2 files changed, 217 insertions(+) diff --git a/gcc/testsuite/gcc.target/powerpc/dm-double-test.c b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c new file mode 100644 index 000..66c19779585 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c @@ -0,0 +1,194 @@ +/* Test derived from mma-double-1.c, modified for dense math. */ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_dense_math_ok } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +#include +#include +#include + +typedef unsigned char vec_t __attribute__ ((vector_size (16))); +typedef double v4sf_t __attribute__ ((vector_size (16))); +#define SAVE_ACC(ACC, ldc, J) \ + __builtin_mma_disassemble_acc (result, ACC); \ + rowC = (v4sf_t *) &CO[0*ldc+J]; \ + rowC[0] += result[0]; \ + rowC = (v4sf_t *) &CO[1*ldc+J]; \ + rowC[0] += result[1]; \ + rowC = (v4sf_t *) &CO[2*ldc+J]; \ + rowC[0] += result[2]; \ + rowC = (v4sf_t *) &CO[3*ldc+J]; \ + rowC[0] += result[3]; + +void +DM (int m, int n, int k, double *A, double *B, double *C) +{ + __vector_quad acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; + v4sf_t result[4]; + v4sf_t *rowC; + for (int l = 0; l < n; l += 4) +{ + double *CO; + double *AO; + AO = A; + CO = C; + C += m * 4; + for (int j = 0; j < m; j += 16) + { + double *BO = B; + __builtin_mma_xxsetaccz (&acc0); + __builtin_mma_xxsetaccz (&acc1); + __builtin_mma_xxsetaccz (&acc2); + __builtin_mma_xxsetaccz (&acc3); + __builtin_mma_xxsetaccz (&acc4); + __builtin_mma_xxsetaccz (&acc5); + __builtin_mma_xxsetaccz (&acc6); + __builtin_mma_xxsetaccz (&acc7); + unsigned long i; + + for (i = 0; i < k; i++) + { + vec_t *rowA = (vec_t *) & AO[i * 16]; + __vector_pair rowB; + vec_t *rb = (vec_t *) & BO[i * 4]; + __builtin_mma_assemble_pair (&rowB, rb[1], rb[0]); + __builtin_mma_xvf64gerpp (&acc0, rowB, rowA[0]); + __builtin_mma_xvf64gerpp (&acc1, rowB, rowA[1]); + __builtin_mma_xvf64gerpp (&acc2, rowB, rowA[2]); + __builtin_mma_xvf64gerpp (&acc3, rowB, rowA[3]); + __builtin_mma_xvf64gerpp (&acc4, rowB, rowA[4]); + __builtin_mma_xvf64gerpp (&acc5, rowB, rowA[5]); + __builtin_mma_xvf64gerpp (&acc6, rowB, rowA[6]); + __builtin_mma_xvf64gerpp (&acc7, rowB, rowA[7]); + } + SAVE_ACC (&acc0, m, 0); + SAVE_ACC (&acc2, m, 4); + SAVE_ACC (&acc1, m, 2); + SAVE_ACC (&acc3, m, 6); + SAVE_ACC (&acc4, m, 8); + SAVE_ACC (&acc6, m, 12); + SAVE_ACC (&acc5, m, 10); + SAVE_ACC (&acc7, m, 14); + AO += k * 16; + BO += k * 4; + CO += 16; + } + B += k * 4; +} +} + +void +init (double *matrix, int row, int column) +{ + for (int j = 0; j < column; j++) +{ + for (int i = 0; i < row; i++) + { + matrix[j * row + i] = (i * 16 + 2 + j) / 0.123; + } +} +} + +void +init0 (double *matrix, double *matrix1, int row, int column) +{ + for (int j = 0; j < column; j++) +for (int i = 0; i < row; i++) + matrix[j * row + i] = matrix1[j * row + i] = 0; +} + + +void +print (const char *name, const double *matrix, int row, int column) +{ + printf ("Matrix %s has %d rows and %d columns:\n", name, row, column); + for (int i = 0; i < row; i++) +{ + for (int j = 0; j < column; j++) + { + printf ("%f ", matrix[j * row + i]); + } + printf ("\n"); +} + printf ("\n"); +} + +int +main (int argc, char *argv[]) +{ + int rowsA, colsB, common; + int i, j, k; + int ret = 0; + + for (int t = 16; t <= 128; t += 16) +{ + for (int t1 = 4; t1 <= 16; t1 += 4) + { + rowsA = t; + colsB = t1; + common = 1; + /* printf ("Running test for rows = %d,cols = %d\n", t, t1); */ + double A[rowsA * common]; + double B[common * colsB]; + double C[rowsA * colsB]; + double D[rowsA * colsB]; + + + init (A, rowsA, common); + init (B, common, colsB); + init0 (C, D, rowsA, colsB); + DM (rowsA, colsB, common, A, B, C); + +
[gcc(refs/users/meissner/heads/work161-dmf)] PowerPC: Add support for 1, 024 bit DMR registers.
https://gcc.gnu.org/g:762f784653fffaa00a77da285f7b32f0f4d75027 commit 762f784653fffaa00a77da285f7b32f0f4d75027 Author: Michael Meissner Date: Tue Mar 5 15:49:49 2024 -0500 PowerPC: Add support for 1,024 bit DMR registers. This patch is a prelimianry patch to add the full 1,024 bit dense math register (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the DMR register. This patch only adds the new 1,024 bit register support. It does not add support for any instructions that need 1,024 bit registers instead of 512 bit registers. I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit registers. The 'wD' constraint added in previous patches is used for these registers. I added support to do load and store of DMRs via the VSX registers, since there are no load/store dense math instructions. I added the new keyword '__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I don't have aliases for __dmr512 and __dmr1024 that we've discussed internally. The patches have been tested on both little and big endian systems. Can I check it into the master branch? 2024-03-05 Michael Meissner gcc/ * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec. (UNSPEC_DM_INSERT512_LOWER): Likewise. (UNSPEC_DM_EXTRACT512): Likewise. (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise. (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise. (movtdo): New define_expand and define_insn_and_split to implement 1,024 bit DMR registers. (movtdo_insert512_upper): New insn. (movtdo_insert512_lower): Likewise. (movtdo_extract512): Likewise. (reload_dmr_from_memory): Likewise. (reload_dmr_to_memory): Likewise. * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR support. (rs6000_init_builtins): Add support for __dmr keyword. * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support for TDOmode. (rs6000_function_arg): Likewise. * config/rs6000/rs6000-modes.def (TDOmode): New mode. * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add support for TDOmode. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_hard_regno_mode_ok): Likewise. (rs6000_modes_tieable_p): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload hooks for DMR mode. (reg_offset_addressing_ok_p): Add support for TDOmode. (rs6000_emit_move): Likewise. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_secondary_reload_class): Likewise. (rs6000_mangle_type): Add mangling for __dmr type. (rs6000_dmr_register_move_cost): Add support for TDOmode. (rs6000_split_multireg_move): Likewise. (rs6000_invalid_conversion): Likewise. * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode. (enum rs6000_builtin_type_index): Add DMR type nodes. (dmr_type_node): Likewise. (ptr_dmr_type_node): Likewise. gcc/testsuite/ * gcc.target/powerpc/dm-1024bit.c: New test. Diff: --- gcc/config/rs6000/mma.md | 154 ++ gcc/config/rs6000/rs6000-builtin.cc | 17 +++ gcc/config/rs6000/rs6000-call.cc | 10 +- gcc/config/rs6000/rs6000-modes.def| 4 + gcc/config/rs6000/rs6000.cc | 102 - gcc/config/rs6000/rs6000.h| 6 +- gcc/testsuite/gcc.target/powerpc/dm-1024bit.c | 63 +++ 7 files changed, 322 insertions(+), 34 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index f3870eac51a..4f9c59046ea 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -91,6 +91,11 @@ UNSPEC_MMA_XVI8GER4SPP UNSPEC_MMA_XXMFACC UNSPEC_MMA_XXMTACC + UNSPEC_DM_INSERT512_UPPER + UNSPEC_DM_INSERT512_LOWER + UNSPEC_DM_EXTRACT512 + UNSPEC_DMR_RELOAD_FROM_MEMORY + UNSPEC_DMR_RELOAD_TO_MEMORY ]) (define_c_enum "unspecv" @@ -770,3 +775,152 @@ } [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) + +;; TDOmode (__dmr keyword for 1,024 bit registers). +(define_expand "movtdo" + [(set (match_operand:TDO 0 "nonimmediate_operand") + (match_operand:TDO 1 "input_operand"))] + "TARGET_MMA_DENSE_MATH" +{ + rs6000_emit_move (operands[0], operands[1], TDOmode); + DONE; +}) + +(define_insn_and_split "*movtdo" + [(set (match_operand:TDO 0 "nonim
[gcc(refs/users/meissner/heads/work161-dmf)] Update ChangeLog.*
https://gcc.gnu.org/g:75f93f569f3a30d58aa6dbb17fa00aa99cea6f32 commit 75f93f569f3a30d58aa6dbb17fa00aa99cea6f32 Author: Michael Meissner Date: Tue Mar 5 15:54:19 2024 -0500 Update ChangeLog.* Diff: --- gcc/ChangeLog.dmf | 93 +-- 1 file changed, 28 insertions(+), 65 deletions(-) diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf index 5de91e24b91..9a88684bc29 100644 --- a/gcc/ChangeLog.dmf +++ b/gcc/ChangeLog.dmf @@ -1,4 +1,4 @@ - Branch work161-dmf, patch #145 + Branch work161-dmf, patch 156 PowerPC: Add support for 1,024 bit DMR registers. @@ -70,7 +70,19 @@ gcc/testsuite/ * gcc.target/powerpc/dm-1024bit.c: New test. - Branch work161-dmf, patch #144 + Branch work161-dmf, patch 155 + +Add dense math test for new instruction names. + +2024-03-05 Michael Meissner + +gcc/testsuite/ + + * gcc.target/powerpc/dm-double-test.c: New test. + * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New + target test. + + Branch work161-dmf, patch 154 PowerPC: Switch to dense math names for all MMA operations. @@ -79,59 +91,12 @@ the original name used in power10 to the new name when used with the dense math system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the same bits for either spelling. -The patches have been tested on both little and big endian systems. Can I check -it into the master branch? - For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the instruction. However, the prefixed instructions have a 'pm' prefix, and we add the 'dm' prefix afterwards. To prevent having two sets of parallel int attributes, we remove the "pm" prefix from the instruction string in the attributes, and add it later, both in the insn name and in the output template. -For example, previously we had - - (define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")]) - - ;; ... - - (define_insn "mma_" -[(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") - (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") - (match_operand:SI 5 "u8bit_cint_operand" "n,n,n")] - MMA_VVI4I4I8))] -"TARGET_MMA" -" %A0,%x1,%x2,%3,%4,%5" -[(set_attr "type" "mma") - (set_attr "prefixed" "yes") - (set_attr "isa" "dm,not_dm,not_dm")]) - -And now we have: - - (define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "xvi4ger8")]) - - ;; ... - - (define_insn "mma_pm" -[(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") - (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") - (match_operand:SI 5 "u8bit_cint_operand" "n,n,n")] - MMA_VVI4I4I8))] -"TARGET_MMA" -"@ - pmdm %A0,%x1,%x2,%3,%4,%5 - pm %A0,%x1,%x2,%3,%4,%5 - pm %A0,%x1,%x2,%3,%4,%5" -[(set_attr "type" "mma") - (set_attr "prefixed" "yes") - (set_attr "isa" "dm,not_dm,not_dm")]) - - 2024-03-05 Michael Meissner gcc/ @@ -168,13 +133,7 @@ gcc/ (mma_pm): Likewise. (mma_pm): Likewise. -gcc/testsuite/ - - * gcc.target/powerpc/dm-double-test.c: New test. - * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New - target test. - - Branch work161-dmf, patch #143 + Branch work161-dmf, patch 153 Add support for dense math registers. @@ -280,12 +239,8 @@ produce other changes in the future. (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD. (REGISTER_NAMES): Add DMR registers. (ADDITIONAL_REGISTER_NAMES): Likewise. - * config/rs6000/rs6000.md (FIRST_DMR_REGNO): New constant. - (LAST_DMR_REGNO): Likewise. - (isa attribute): Add 'dm' and 'not_dm' attributes. - (enabled attribute): Support 'dm' and 'not_dm' attributes. - Branch work161-dmf, patch #142 + Branch work161-dmf, patch 152 Add wD constraint. @@ -294,7 +249,7 @@ that overlap with VSX registers 0..31 on power10. Future patches will add the support for a separate accumulator register class that will be used when the support for dense math registe
[gcc(refs/users/meissner/heads/work161-dmf)] Revert changes
https://gcc.gnu.org/g:0103569c3644cbc05eb37bbda6c804cb3800f663 commit 0103569c3644cbc05eb37bbda6c804cb3800f663 Author: Michael Meissner Date: Tue Mar 5 17:24:43 2024 -0500 Revert changes Diff: --- gcc/config/rs6000/mma.md | 360 -- gcc/config/rs6000/predicates.md | 21 +- gcc/config/rs6000/rs6000-builtin.cc | 22 +- gcc/config/rs6000/rs6000-call.cc | 10 +- gcc/config/rs6000/rs6000-modes.def| 4 - gcc/config/rs6000/rs6000.cc | 320 --- gcc/config/rs6000/rs6000.h| 49 +-- gcc/config/rs6000/rs6000.md | 2 - gcc/testsuite/gcc.target/powerpc/dm-1024bit.c | 63 gcc/testsuite/gcc.target/powerpc/dm-double-test.c | 194 gcc/testsuite/lib/target-supports.exp | 23 -- 11 files changed, 142 insertions(+), 926 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 4f9c59046ea..49cf5f8fe43 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -91,11 +91,6 @@ UNSPEC_MMA_XVI8GER4SPP UNSPEC_MMA_XXMFACC UNSPEC_MMA_XXMTACC - UNSPEC_DM_INSERT512_UPPER - UNSPEC_DM_INSERT512_LOWER - UNSPEC_DM_EXTRACT512 - UNSPEC_DMR_RELOAD_FROM_MEMORY - UNSPEC_DMR_RELOAD_TO_MEMORY ]) (define_c_enum "unspecv" @@ -229,47 +224,44 @@ (UNSPEC_MMA_XVF64GERNP "xvf64gernp") (UNSPEC_MMA_XVF64GERNN "xvf64gernn")]) -;; The "pm" prefix is not in these expansions, so that we can generate -;; pmdmxvi4ger8 on systems with dense math registers and xvi4ger8 on systems -;; without dense math registers. -(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "xvi4ger8")]) +(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")]) -(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "xvi4ger8pp")]) +(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "pmxvi4ger8pp")]) -(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2"xvi16ger2") -(UNSPEC_MMA_PMXVI16GER2S "xvi16ger2s") -(UNSPEC_MMA_PMXVF16GER2"xvf16ger2") -(UNSPEC_MMA_PMXVBF16GER2 "xvbf16ger2")]) +(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2") +(UNSPEC_MMA_PMXVI16GER2S "pmxvi16ger2s") +(UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2") +(UNSPEC_MMA_PMXVBF16GER2 "pmxvbf16ger2")]) -(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "xvi16ger2pp") -(UNSPEC_MMA_PMXVI16GER2SPP "xvi16ger2spp") -(UNSPEC_MMA_PMXVF16GER2PP "xvf16ger2pp") -(UNSPEC_MMA_PMXVF16GER2PN "xvf16ger2pn") -(UNSPEC_MMA_PMXVF16GER2NP "xvf16ger2np") -(UNSPEC_MMA_PMXVF16GER2NN "xvf16ger2nn") -(UNSPEC_MMA_PMXVBF16GER2PP "xvbf16ger2pp") -(UNSPEC_MMA_PMXVBF16GER2PN "xvbf16ger2pn") -(UNSPEC_MMA_PMXVBF16GER2NP "xvbf16ger2np") -(UNSPEC_MMA_PMXVBF16GER2NN "xvbf16ger2nn")]) +(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "pmxvi16ger2pp") +(UNSPEC_MMA_PMXVI16GER2SPP "pmxvi16ger2spp") +(UNSPEC_MMA_PMXVF16GER2PP "pmxvf16ger2pp") +(UNSPEC_MMA_PMXVF16GER2PN "pmxvf16ger2pn") +(UNSPEC_MMA_PMXVF16GER2NP "pmxvf16ger2np") +(UNSPEC_MMA_PMXVF16GER2NN "pmxvf16ger2nn") +(UNSPEC_MMA_PMXVBF16GER2PP "pmxvbf16ger2pp") +(UNSPEC_MMA_PMXVBF16GER2PN "pmxvbf16ger2pn") +(UNSPEC_MMA_PMXVBF16GER2NP "pmxvbf16ger2np") +(UNSPEC_MMA_PMXVBF16GER2NN "pmxvbf16ger2nn")]) -(define_int_attr vvi4i4[(UNSPEC_MMA_PMXVF32GER "xvf32ger")]) +(define_int_attr vvi4i4[(UNSPEC_MMA_PMXVF32GER "pmxvf32ger")]) -(define_int_attr avvi4i4 [(UNSPEC_MMA_PMXVF32GERPP "xvf32gerpp") -(UNSPEC_MMA_PMXVF32GERPN "xvf32gerpn") -(UNSPEC_MMA_PMXVF32GERNP "xvf32gernp") -(UNSPEC_MMA_PMXVF32GERNN "xvf32gernn")]) +(define_int_attr avvi4i4 [(UNSPEC_MMA_PMXVF32GERPP "pmxvf32gerpp") +(UNSPEC_MMA_PMXVF32GERPN "p
[gcc r14-9325] ctf: fix incorrect CTF for multi-dimensional array types
https://gcc.gnu.org/g:5d24bf3afd1bea3e51b87fb7ff24c21e29913999 commit r14-9325-g5d24bf3afd1bea3e51b87fb7ff24c21e29913999 Author: Cupertino Miranda Date: Thu Feb 29 10:56:13 2024 -0800 ctf: fix incorrect CTF for multi-dimensional array types PR debug/114186 DWARF DIEs of type DW_TAG_subrange_type are linked together to represent the information about the subsequent dimensions. The CTF processing was so far working through them in the opposite (incorrect) order. While fixing the issue, refactor the code a bit for readability. co-authored-By: Indu Bhagat gcc/ PR debug/114186 * dwarf2ctf.cc (gen_ctf_array_type): Invoke the ctf_add_array () in the correct order of the dimensions. (gen_ctf_subrange_type): Refactor out handling of DW_TAG_subrange_type DIE to here. gcc/testsuite/ PR debug/114186 * gcc.dg/debug/ctf/ctf-array-6.c: Add test. Diff: --- gcc/dwarf2ctf.cc | 158 +-- gcc/testsuite/gcc.dg/debug/ctf/ctf-array-6.c | 14 +++ 2 files changed, 89 insertions(+), 83 deletions(-) diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc index dca86edfffa..77d6bf89689 100644 --- a/gcc/dwarf2ctf.cc +++ b/gcc/dwarf2ctf.cc @@ -349,105 +349,97 @@ gen_ctf_pointer_type (ctf_container_ref ctfc, dw_die_ref ptr_type) return ptr_type_id; } -/* Generate CTF for an array type. */ +/* Recursively generate CTF for array dimensions starting at DIE C (of type + DW_TAG_subrange_type) until DIE LAST (of type DW_TAG_subrange_type) is + reached. ARRAY_ELEMS_TYPE_ID is base type for the array. */ static ctf_id_t -gen_ctf_array_type (ctf_container_ref ctfc, dw_die_ref array_type) +gen_ctf_subrange_type (ctf_container_ref ctfc, ctf_id_t array_elems_type_id, + dw_die_ref c, dw_die_ref last) { - dw_die_ref c; - ctf_id_t array_elems_type_id = CTF_NULL_TYPEID; + ctf_arinfo_t arinfo; + ctf_id_t array_node_type_id = CTF_NULL_TYPEID; - int vector_type_p = get_AT_flag (array_type, DW_AT_GNU_vector); - if (vector_type_p) -return array_elems_type_id; + dw_attr_node *upper_bound_at; + dw_die_ref array_index_type; + uint32_t array_num_elements; - dw_die_ref array_elems_type = ctf_get_AT_type (array_type); + if (dw_get_die_tag (c) == DW_TAG_subrange_type) +{ + /* When DW_AT_upper_bound is used to specify the size of an +array in DWARF, it is usually an unsigned constant +specifying the upper bound index of the array. However, +for unsized arrays, such as foo[] or bar[0], +DW_AT_upper_bound is a signed integer constant +instead. */ + + upper_bound_at = get_AT (c, DW_AT_upper_bound); + if (upper_bound_at + && AT_class (upper_bound_at) == dw_val_class_unsigned_const) + /* This is the upper bound index. */ + array_num_elements = get_AT_unsigned (c, DW_AT_upper_bound) + 1; + else if (get_AT (c, DW_AT_count)) + array_num_elements = get_AT_unsigned (c, DW_AT_count); + else + { + /* This is a VLA of some kind. */ + array_num_elements = 0; + } +} + else +gcc_unreachable (); - /* First, register the type of the array elements if needed. */ - array_elems_type_id = gen_ctf_type (ctfc, array_elems_type); + /* Ok, mount and register the array type. Note how the array + type we register here is the type of the elements in + subsequent "dimensions", if there are any. */ + arinfo.ctr_nelems = array_num_elements; - /* DWARF array types pretend C supports multi-dimensional arrays. - So for the type int[N][M], the array type DIE contains two - subrange_type children, the first with upper bound N-1 and the - second with upper bound M-1. + array_index_type = ctf_get_AT_type (c); + arinfo.ctr_index = gen_ctf_type (ctfc, array_index_type); - CTF, on the other hand, just encodes each array type in its own - array type CTF struct. Therefore we have to iterate on the - children and create all the needed types. */ + if (c == last) +arinfo.ctr_contents = array_elems_type_id; + else +arinfo.ctr_contents = gen_ctf_subrange_type (ctfc, array_elems_type_id, +dw_get_die_sib (c), last); - c = dw_get_die_child (array_type); - gcc_assert (c); - do -{ - ctf_arinfo_t arinfo; - dw_die_ref array_index_type; - uint32_t array_num_elements; + if (!ctf_type_exists (ctfc, c, &array_node_type_id)) +array_node_type_id = ctf_add_array (ctfc, CTF_ADD_ROOT, &arinfo, c); - c = dw_get_die_sib (c); + return array_node_type_id; +} - if (dw_get_die_tag (c) == DW_TAG_subrange_type) - { - dw_attr_node *upper_bound_at; - - array_index_type = ctf_get_AT_type (c); - - /* When DW_AT_upper_bound is used to specify the size of an -a
[gcc r14-9326] Daily bump.
https://gcc.gnu.org/g:214dadf30a3bab0d02b8c6512a2d0475e2643dc7 commit r14-9326-g214dadf30a3bab0d02b8c6512a2d0475e2643dc7 Author: GCC Administrator Date: Wed Mar 6 00:17:18 2024 + Daily bump. Diff: --- gcc/ChangeLog | 103 gcc/DATESTAMP | 2 +- gcc/c-family/ChangeLog | 8 gcc/cp/ChangeLog| 5 +++ gcc/testsuite/ChangeLog | 63 + 5 files changed, 180 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 5eb0d89fa7e..89da2603913 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,106 @@ +2024-03-05 Cupertino Miranda + Indu Bhagat + + PR debug/114186 + * dwarf2ctf.cc (gen_ctf_array_type): Invoke the ctf_add_array () + in the correct order of the dimensions. + (gen_ctf_subrange_type): Refactor out handling of + DW_TAG_subrange_type DIE to here. + +2024-03-05 Richard Sandiford + + PR sanitizer/97696 + * asan.cc (asan_expand_mark_ifn): Allow the length to be a poly_int. + +2024-03-05 Richard Sandiford + + * config/aarch64/aarch64.md (stride_type): Remove luti_consecutive + and luti_strided. + * config/aarch64/aarch64-sme.md + (@aarch64_sme_lut): Remove stride_type attribute. + (@aarch64_sme_lut_strided2): Delete. + (@aarch64_sme_lut_strided4): Likewise. + * config/aarch64/aarch64-early-ra.cc (is_stride_candidate) + (early_ra::maybe_convert_to_strided_access): Remove support for + strided LUTI2 and LUTI4. + +2024-03-05 Richard Earnshaw + + PR target/113510 + * config/arm/thumb1.md (peephole2 to fuse mov imm/add SP): Use + low_register_operand. + +2024-03-05 Georg-Johann Lay + + * config/avr/avr.md: Add two RTL peepholes for PLUS, IOR and AND + in HI, PSI, SI that swap operation order from "X = CST, X o= Y" + to "X = Y, X o= CST". + +2024-03-05 Xi Ruoyao + + * config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add + s9 as an alias of r22. + +2024-03-05 Roger Sayle + + * config/avr/avr-protos.h (avr_out_insv): New proto. + * config/avr/avr.cc (avr_out_insv): New function. + (avr_adjust_insn_length) [ADJUST_LEN_INSV]: Handle case. + (avr_cbranch_cost) [ZERO_EXTRACT]: Adjust rtx costs. + * config/avr/avr.md (define_attr "adjust_len") Add insv. + (andhi3, *andhi3, andpsi3, *andpsi3, andsi3, *andsi3): + Add constraint alternative where the 3rd operand is a power + of 2, and the source register may differ from the destination. + (*insv.any_shift._split): Call avr_out_insv to output + instructions. Set attr "length" to "insv". + * config/avr/constraints.md (Cb2, Cb3, Cb4): New constraints. + +2024-03-05 Richard Biener + + PR tree-optimization/114231 + * tree-vect-slp.cc (vect_analyze_slp): Lookup patterns when + processing a BB SLP root. + +2024-03-05 Jakub Jelinek + + PR rtl-optimization/114211 + * lower-subreg.cc (resolve_simple_move): For double-word + rotates by BITS_PER_WORD if there is overlap between source + and destination use a temporary. + +2024-03-05 Jakub Jelinek + + PR middle-end/114157 + * gimple-lower-bitint.cc: Include stor-layout.h. + (mergeable_op): Return true for BIT_FIELD_REF. + (struct bitint_large_huge): Declare handle_bit_field_ref method. + (bitint_large_huge::handle_bit_field_ref): New method. + (bitint_large_huge::handle_stmt): Use it for BIT_FIELD_REF. + +2024-03-05 Jakub Jelinek + + PR target/114116 + * config/i386/i386.h (enum call_saved_registers_type): Add + TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP enumerator. + * config/i386/i386-options.cc (ix86_set_func_type): Remove + has_no_callee_saved_registers variable, add no_callee_saved_registers + instead, initialize it depending on whether it is + no_callee_saved_registers function or not. Don't set it if + no_caller_saved_registers attribute is present. Adjust users. + * config/i386/i386.cc (ix86_function_ok_for_sibcall): Handle + TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP like + TYPE_NO_CALLEE_SAVED_REGISTERS. + (ix86_save_reg): Handle TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP. + +2024-03-05 Pan Li + + * config/riscv/riscv.cc (riscv_v_adjust_bytesize): Cleanup unused + mode_size related code. + +2024-03-05 Patrick Palka + + * doc/invoke.texi (-Wno-global-module): Document. + 2024-03-04 David Faust * config/bpf/bpf-protos.h (bpf_expand_setmem): New prototype. diff --git a/gcc/DATESTAMP b/gcc/DATESTAMP index 8585b3d500e..c7e324d32c0 100644 --- a/gcc/DATESTAMP +++ b/gcc/DATESTAMP @@ -1 +1 @@ -20240305 +20240306 diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog index a8f5bfbf772
[gcc r11-11270] Daily bump.
https://gcc.gnu.org/g:16ead05d13ac69aca5a385148dac9109e199b10d commit r11-11270-g16ead05d13ac69aca5a385148dac9109e199b10d Author: GCC Administrator Date: Wed Mar 6 00:19:12 2024 + Daily bump. Diff: --- gcc/DATESTAMP | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/DATESTAMP b/gcc/DATESTAMP index 8585b3d500e..c7e324d32c0 100644 --- a/gcc/DATESTAMP +++ b/gcc/DATESTAMP @@ -1 +1 @@ -20240305 +20240306
[gcc r12-10195] Daily bump.
https://gcc.gnu.org/g:81161618d3966b5da6b7627c86d5390b646a9e0e commit r12-10195-g81161618d3966b5da6b7627c86d5390b646a9e0e Author: GCC Administrator Date: Wed Mar 6 00:19:49 2024 + Daily bump. Diff: --- gcc/DATESTAMP | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/DATESTAMP b/gcc/DATESTAMP index 8585b3d500e..c7e324d32c0 100644 --- a/gcc/DATESTAMP +++ b/gcc/DATESTAMP @@ -1 +1 @@ -20240305 +20240306
[gcc(refs/users/meissner/heads/work161-dmf)] Add support for dense math registers.
https://gcc.gnu.org/g:d4a30470e9874de74f013876c95152df559c93a6 commit d4a30470e9874de74f013876c95152df559c93a6 Author: Michael Meissner Date: Tue Mar 5 19:33:04 2024 -0500 Add support for dense math registers. The MMA subsystem added the notion of accumulator registers as an optional feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with the VSX registers 0..31, but logically the accumulator registers were separate from the FPR registers. In ISA 3.1, it was anticipated that in future systems, the accumulator registers may no overlap with the FPR registers. This patch adds the support for dense math registers as separate registers. This particular patch does not change the MMA support to use the accumulators within the dense math registers. This patch just adds the basic support for having separate DMRs. The next patch will switch the MMA support to use the accumulators if -mcpu=future is used. For testing purposes, I added an undocumented option '-mdense-math' to enable or disable the dense math support. This patch adds a new constraint (wD). If MMA is selected but dense math is not selected (i.e. -mcpu=power10), the wD constraint will allow access to accumulators that overlap with VSX registers 0..31. If both MMA and dense math are selected (i.e. -mcpu=future), the wD constraint will only allow dense math registers. This patch modifies the existing %A output modifier. If MMA is selected but dense math is not selected, then %A output modifier converts the VSX register number to the accumulator number, by dividing it by 4. If both MMA and dense math are selected, then %A will map the separate DMR registers into 0..7. The intention is that user code using extended asm can be modified to run on both MMA without dense math and MMA with dense math: 1) If possible, don't use extended asm, but instead use the MMA built-in functions; 2) If you do need to write extended asm, change the d constraints targetting accumulators should now use wD; 3) Only use the built-in zero, assemble and disassemble functions create move data between vector quad types and dense math accumulators. I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the extended asm code. The reason is these instructions assume there is a 1-to-1 correspondence between 4 adjacent FPR registers and an accumulator that overlaps with those instructions. With accumulators now being separate registers, there no longer is a 1-to-1 correspondence. It is possible that the mangling for DMRs and the GDB register numbers may produce other changes in the future. 2024-03-05 Michael Meissner * config/rs6000/mma.md (movxo): Add comments about dense math registers. (movxo_nodm): Rename from movxo and restrict the usage to machines without dense math registers. (movxo_dm): New insn for movxo support for machines with dense math registers. (mma_): Restrict usage to machines without dense math registers. (mma_xxsetaccz): Make a define_expand, and add support for dense math registers. (mma_xxsetaccz_nodm): Rename from mma_xxsetaccz, and restrict to machines without dense math registers. (mma_dmsetaccz): New insn. * config/rs6000/predicates.md (dmr_operand): New predicate. (accumulator_operand): Add support for dense math registers. * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do not de-prime accumulator when disassembling a vector quad. * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE. (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR. (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD constraint. (reload_reg_map): Likewise. (rs6000_reg_names): Likewise. (alt_reg_names): Likewise. (rs6000_hard_regno_nregs_internal): Likewise. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Likewise. (rs6000_secondary_reload_memory): Add support for DMR registers. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_secondary_reload_class): Likewise. (print_operand): Make %A handle both FPRs and DMRs. (rs6000_dmr_register_move_cost): New helper function. (rs6000_register_move_cost): Add support for DMR registers.
[gcc(refs/users/meissner/heads/work161-dmf)] PowerPC: Switch to dense math names for all MMA operations.
https://gcc.gnu.org/g:f6f63c432351eae3d363fde51d2d1dc5ed68e901 commit f6f63c432351eae3d363fde51d2d1dc5ed68e901 Author: Michael Meissner Date: Tue Mar 5 19:36:25 2024 -0500 PowerPC: Switch to dense math names for all MMA operations. This patch changes the assembler instruction names for MMA instructions from the original name used in power10 to the new name when used with the dense math system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the same bits for either spelling. For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the instruction. However, the prefixed instructions have a 'pm' prefix, and we add the 'dm' prefix afterwards. To prevent having two sets of parallel int attributes, we remove the "pm" prefix from the instruction string in the attributes, and add it later, both in the insn name and in the output template. 2024-03-05 Michael Meissner gcc/ * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a "pm" prefix. (avvi4i4i8): Likewise. (vvi4i4i2): Likewise. (avvi4i4i2): Likewise. (vvi4i4): Likewise. (avvi4i4): Likewise. (pvi4i2): Likewise. (apvi4i2): Likewise. (vvi4i4i4): Likewise. (avvi4i4i4): Likewise. (mma_xxsetaccz): Add support for running on DMF systems, generating the dense math instruction and using the dense math accumulators. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_pm): Add support for running on DMF systems, generating the dense math instruction and using the dense math accumulators. Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm' prefixes based on whether we have the original MMA specification or if we have dense math support. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. Diff: --- gcc/config/rs6000/mma.md | 161 +++ 1 file changed, 107 insertions(+), 54 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 2ce613b46cc..f3870eac51a 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -224,44 +224,47 @@ (UNSPEC_MMA_XVF64GERNP "xvf64gernp") (UNSPEC_MMA_XVF64GERNN "xvf64gernn")]) -(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")]) +;; The "pm" prefix is not in these expansions, so that we can generate +;; pmdmxvi4ger8 on systems with dense math registers and xvi4ger8 on systems +;; without dense math registers. +(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "xvi4ger8")]) -(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "pmxvi4ger8pp")]) +(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "xvi4ger8pp")]) -(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2") -(UNSPEC_MMA_PMXVI16GER2S "pmxvi16ger2s") -(UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2") -(UNSPEC_MMA_PMXVBF16GER2 "pmxvbf16ger2")]) +(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2"xvi16ger2") +(UNSPEC_MMA_PMXVI16GER2S "xvi16ger2s") +(UNSPEC_MMA_PMXVF16GER2"xvf16ger2") +(UNSPEC_MMA_PMXVBF16GER2 "xvbf16ger2")]) -(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "pmxvi16ger2pp") -(UNSPEC_MMA_PMXVI16GER2SPP "pmxvi16ger2spp") -(UNSPEC_MMA_PMXVF16GER2PP "pmxvf16ger2pp") -(UNSPEC_MMA_PMXVF16GER2PN "pmxvf16ger2pn") -(UNSPEC_MMA_PMXVF16GER2NP "pmxvf16ger2np") -(UNSPEC_MMA_PMXVF16GER2NN "pmxvf16ger2nn") -(UNSPEC_MMA_PMXVBF16GER2PP "pmxvbf16ger2pp") -(UNSPEC_MMA_PMXVBF16GER2PN "pmxvbf16ger2pn") -(UNSPEC_MMA_PMXVBF16GER2NP "pmxvbf16ger2np") -(UNSPEC_MMA_PMXVBF16GER2NN "pmxvbf16ger2nn")]) +(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "xvi16ger2pp") +(UNSPEC_MMA_PMXVI16GER2SPP "xvi16ger2spp") +(UNSPEC_MMA_PMXVF16GER2PP "xvf16ger2pp") +(UNSPEC
[gcc(refs/users/meissner/heads/work161-dmf)] Add dense math test for new instruction names.
https://gcc.gnu.org/g:a8bdcf985977ba12761daad4527da50721257138 commit a8bdcf985977ba12761daad4527da50721257138 Author: Michael Meissner Date: Tue Mar 5 19:38:01 2024 -0500 Add dense math test for new instruction names. 2024-03-05 Michael Meissner gcc/testsuite/ * gcc.target/powerpc/dm-double-test.c: New test. * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New target test. Diff: --- gcc/testsuite/gcc.target/powerpc/dm-double-test.c | 194 ++ gcc/testsuite/lib/target-supports.exp | 23 +++ 2 files changed, 217 insertions(+) diff --git a/gcc/testsuite/gcc.target/powerpc/dm-double-test.c b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c new file mode 100644 index 000..66c19779585 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c @@ -0,0 +1,194 @@ +/* Test derived from mma-double-1.c, modified for dense math. */ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_dense_math_ok } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +#include +#include +#include + +typedef unsigned char vec_t __attribute__ ((vector_size (16))); +typedef double v4sf_t __attribute__ ((vector_size (16))); +#define SAVE_ACC(ACC, ldc, J) \ + __builtin_mma_disassemble_acc (result, ACC); \ + rowC = (v4sf_t *) &CO[0*ldc+J]; \ + rowC[0] += result[0]; \ + rowC = (v4sf_t *) &CO[1*ldc+J]; \ + rowC[0] += result[1]; \ + rowC = (v4sf_t *) &CO[2*ldc+J]; \ + rowC[0] += result[2]; \ + rowC = (v4sf_t *) &CO[3*ldc+J]; \ + rowC[0] += result[3]; + +void +DM (int m, int n, int k, double *A, double *B, double *C) +{ + __vector_quad acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; + v4sf_t result[4]; + v4sf_t *rowC; + for (int l = 0; l < n; l += 4) +{ + double *CO; + double *AO; + AO = A; + CO = C; + C += m * 4; + for (int j = 0; j < m; j += 16) + { + double *BO = B; + __builtin_mma_xxsetaccz (&acc0); + __builtin_mma_xxsetaccz (&acc1); + __builtin_mma_xxsetaccz (&acc2); + __builtin_mma_xxsetaccz (&acc3); + __builtin_mma_xxsetaccz (&acc4); + __builtin_mma_xxsetaccz (&acc5); + __builtin_mma_xxsetaccz (&acc6); + __builtin_mma_xxsetaccz (&acc7); + unsigned long i; + + for (i = 0; i < k; i++) + { + vec_t *rowA = (vec_t *) & AO[i * 16]; + __vector_pair rowB; + vec_t *rb = (vec_t *) & BO[i * 4]; + __builtin_mma_assemble_pair (&rowB, rb[1], rb[0]); + __builtin_mma_xvf64gerpp (&acc0, rowB, rowA[0]); + __builtin_mma_xvf64gerpp (&acc1, rowB, rowA[1]); + __builtin_mma_xvf64gerpp (&acc2, rowB, rowA[2]); + __builtin_mma_xvf64gerpp (&acc3, rowB, rowA[3]); + __builtin_mma_xvf64gerpp (&acc4, rowB, rowA[4]); + __builtin_mma_xvf64gerpp (&acc5, rowB, rowA[5]); + __builtin_mma_xvf64gerpp (&acc6, rowB, rowA[6]); + __builtin_mma_xvf64gerpp (&acc7, rowB, rowA[7]); + } + SAVE_ACC (&acc0, m, 0); + SAVE_ACC (&acc2, m, 4); + SAVE_ACC (&acc1, m, 2); + SAVE_ACC (&acc3, m, 6); + SAVE_ACC (&acc4, m, 8); + SAVE_ACC (&acc6, m, 12); + SAVE_ACC (&acc5, m, 10); + SAVE_ACC (&acc7, m, 14); + AO += k * 16; + BO += k * 4; + CO += 16; + } + B += k * 4; +} +} + +void +init (double *matrix, int row, int column) +{ + for (int j = 0; j < column; j++) +{ + for (int i = 0; i < row; i++) + { + matrix[j * row + i] = (i * 16 + 2 + j) / 0.123; + } +} +} + +void +init0 (double *matrix, double *matrix1, int row, int column) +{ + for (int j = 0; j < column; j++) +for (int i = 0; i < row; i++) + matrix[j * row + i] = matrix1[j * row + i] = 0; +} + + +void +print (const char *name, const double *matrix, int row, int column) +{ + printf ("Matrix %s has %d rows and %d columns:\n", name, row, column); + for (int i = 0; i < row; i++) +{ + for (int j = 0; j < column; j++) + { + printf ("%f ", matrix[j * row + i]); + } + printf ("\n"); +} + printf ("\n"); +} + +int +main (int argc, char *argv[]) +{ + int rowsA, colsB, common; + int i, j, k; + int ret = 0; + + for (int t = 16; t <= 128; t += 16) +{ + for (int t1 = 4; t1 <= 16; t1 += 4) + { + rowsA = t; + colsB = t1; + common = 1; + /* printf ("Running test for rows = %d,cols = %d\n", t, t1); */ + double A[rowsA * common]; + double B[common * colsB]; + double C[rowsA * colsB]; + double D[rowsA * colsB]; + + + init (A, rowsA, common); + init (B, common, colsB); + init0 (C, D, rowsA, colsB); + DM (rowsA, colsB, common, A, B, C); + +
[gcc(refs/users/meissner/heads/work161-dmf)] PowerPC: Add support for 1, 024 bit DMR registers.
https://gcc.gnu.org/g:53d5166cb5f781cd1c24f1f1814e9f391449658b commit 53d5166cb5f781cd1c24f1f1814e9f391449658b Author: Michael Meissner Date: Tue Mar 5 19:43:58 2024 -0500 PowerPC: Add support for 1,024 bit DMR registers. This patch is a prelimianry patch to add the full 1,024 bit dense math register (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the DMR register. This patch only adds the new 1,024 bit register support. It does not add support for any instructions that need 1,024 bit registers instead of 512 bit registers. I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit registers. The 'wD' constraint added in previous patches is used for these registers. I added support to do load and store of DMRs via the VSX registers, since there are no load/store dense math instructions. I added the new keyword '__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I don't have aliases for __dmr512 and __dmr1024 that we've discussed internally. The patches have been tested on both little and big endian systems. Can I check it into the master branch? 2024-03-05 Michael Meissner gcc/ * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec. (UNSPEC_DM_INSERT512_LOWER): Likewise. (UNSPEC_DM_EXTRACT512): Likewise. (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise. (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise. (movtdo): New define_expand and define_insn_and_split to implement 1,024 bit DMR registers. (movtdo_insert512_upper): New insn. (movtdo_insert512_lower): Likewise. (movtdo_extract512): Likewise. (reload_dmr_from_memory): Likewise. (reload_dmr_to_memory): Likewise. * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR support. (rs6000_init_builtins): Add support for __dmr keyword. * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support for TDOmode. (rs6000_function_arg): Likewise. * config/rs6000/rs6000-modes.def (TDOmode): New mode. * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add support for TDOmode. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_hard_regno_mode_ok): Likewise. (rs6000_modes_tieable_p): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload hooks for DMR mode. (reg_offset_addressing_ok_p): Add support for TDOmode. (rs6000_emit_move): Likewise. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_secondary_reload_class): Likewise. (rs6000_mangle_type): Add mangling for __dmr type. (rs6000_dmr_register_move_cost): Add support for TDOmode. (rs6000_split_multireg_move): Likewise. (rs6000_invalid_conversion): Likewise. * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode. (enum rs6000_builtin_type_index): Add DMR type nodes. (dmr_type_node): Likewise. (ptr_dmr_type_node): Likewise. gcc/testsuite/ * gcc.target/powerpc/dm-1024bit.c: New test. Diff: --- gcc/config/rs6000/mma.md | 154 ++ gcc/config/rs6000/rs6000-builtin.cc | 17 +++ gcc/config/rs6000/rs6000-call.cc | 10 +- gcc/config/rs6000/rs6000-modes.def| 4 + gcc/config/rs6000/rs6000.cc | 101 - gcc/config/rs6000/rs6000.h| 6 +- gcc/testsuite/gcc.target/powerpc/dm-1024bit.c | 63 +++ 7 files changed, 321 insertions(+), 34 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index f3870eac51a..4f9c59046ea 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -91,6 +91,11 @@ UNSPEC_MMA_XVI8GER4SPP UNSPEC_MMA_XXMFACC UNSPEC_MMA_XXMTACC + UNSPEC_DM_INSERT512_UPPER + UNSPEC_DM_INSERT512_LOWER + UNSPEC_DM_EXTRACT512 + UNSPEC_DMR_RELOAD_FROM_MEMORY + UNSPEC_DMR_RELOAD_TO_MEMORY ]) (define_c_enum "unspecv" @@ -770,3 +775,152 @@ } [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) + +;; TDOmode (__dmr keyword for 1,024 bit registers). +(define_expand "movtdo" + [(set (match_operand:TDO 0 "nonimmediate_operand") + (match_operand:TDO 1 "input_operand"))] + "TARGET_MMA_DENSE_MATH" +{ + rs6000_emit_move (operands[0], operands[1], TDOmode); + DONE; +}) + +(define_insn_and_split "*movtdo" + [(set (match_operand:TDO 0 "nonim
[gcc r14-9327] c++/modules: befriending template from current class scope
https://gcc.gnu.org/g:b0d11bb02a4a4c7d61e9b53411ccdc54610b1429 commit r14-9327-gb0d11bb02a4a4c7d61e9b53411ccdc54610b1429 Author: Patrick Palka Date: Tue Mar 5 20:36:36 2024 -0500 c++/modules: befriending template from current class scope Here the TEMPLATE_DECL representing the template friend declaration naming B has class scope since the template B has class scope, but get_merge_kind assumes all DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P TEMPLATE_DECL have namespace scope and wrongly returns MK_named instead of MK_local_friend for the friend. gcc/cp/ChangeLog: * module.cc (trees_out::get_merge_kind) : Accomodate class-scope DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P TEMPLATE_DECL. Consolidate IDENTIFIER_ANON_P cases. gcc/testsuite/ChangeLog: * g++.dg/modules/friend-7.h: New test. * g++.dg/modules/friend-7_a.H: New test. * g++.dg/modules/friend-7_b.C: New test. Reviewed-by: Jason Merrill Diff: --- gcc/cp/module.cc | 19 +-- gcc/testsuite/g++.dg/modules/friend-7.h | 5 + gcc/testsuite/g++.dg/modules/friend-7_a.H | 3 +++ gcc/testsuite/g++.dg/modules/friend-7_b.C | 6 ++ 4 files changed, 23 insertions(+), 10 deletions(-) diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 67f132d28d7..80b63a70a62 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -10498,21 +10498,20 @@ trees_out::get_merge_kind (tree decl, depset *dep) } } - if (RECORD_OR_UNION_TYPE_P (ctx)) + if (TREE_CODE (decl) == TEMPLATE_DECL + && DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (decl)) { - if (IDENTIFIER_ANON_P (DECL_NAME (decl))) - mk = MK_field; + mk = MK_local_friend; break; } - if (TREE_CODE (decl) == TEMPLATE_DECL - && DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (decl)) - mk = MK_local_friend; - else if (IDENTIFIER_ANON_P (DECL_NAME (decl))) + if (IDENTIFIER_ANON_P (DECL_NAME (decl))) { - if (DECL_IMPLICIT_TYPEDEF_P (decl) - && UNSCOPED_ENUM_P (TREE_TYPE (decl)) - && TYPE_VALUES (TREE_TYPE (decl))) + if (RECORD_OR_UNION_TYPE_P (ctx)) + mk = MK_field; + else if (DECL_IMPLICIT_TYPEDEF_P (decl) +&& UNSCOPED_ENUM_P (TREE_TYPE (decl)) +&& TYPE_VALUES (TREE_TYPE (decl))) /* Keyed by first enum value, and underlying type. */ mk = MK_enum; else diff --git a/gcc/testsuite/g++.dg/modules/friend-7.h b/gcc/testsuite/g++.dg/modules/friend-7.h new file mode 100644 index 000..c0f00394f3b --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/friend-7.h @@ -0,0 +1,5 @@ +template +struct A { + template struct B { }; + template friend struct B; +}; diff --git a/gcc/testsuite/g++.dg/modules/friend-7_a.H b/gcc/testsuite/g++.dg/modules/friend-7_a.H new file mode 100644 index 000..e750e4c7d8d --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/friend-7_a.H @@ -0,0 +1,3 @@ +// { dg-additional-options "-fmodule-header" } +// { dg-module-cmi {} } +#include "friend-7.h" diff --git a/gcc/testsuite/g++.dg/modules/friend-7_b.C b/gcc/testsuite/g++.dg/modules/friend-7_b.C new file mode 100644 index 000..eb5e45a3f43 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/friend-7_b.C @@ -0,0 +1,6 @@ +// { dg-additional-options "-fmodules-ts" } +#include "friend-7.h" +import "friend-7_a.H"; + +A a; +A::B b;
[gcc r14-9328] Fortran: Add user defined error messages for UDTIO.
https://gcc.gnu.org/g:21edfb0051ed8d0ff46d5638c2bce2dd71f26d1f commit r14-9328-g21edfb0051ed8d0ff46d5638c2bce2dd71f26d1f Author: Jerry DeLisle Date: Tue Mar 5 20:49:23 2024 -0800 Fortran: Add user defined error messages for UDTIO. The defines IOMSG_LEN and MSGLEN were redundant so these are combined into IOMSG_LEN as defined in io.h. The remainder of the patch adds checks for when a user defined derived type IO procedure sets the IOSTAT or IOMSG variables independent of the librrary defined I/O messages. PR libfortran/105456 libgfortran/ChangeLog: * io/io.h (IOMSG_LEN): Moved to here. * io/list_read.c (MSGLEN): Removed MSGLEN. (convert_integer): Changed MSGLEN to IOMSG_LEN. (parse_repeat): Likewise. (read_logical): Likewise. (read_integer): Likewise. (read_character): Likewise. (parse_real): Likewise. (read_complex): Likewise. (read_real): Likewise. (check_type): Likewise. (list_formatted_read_scalar): Adjust to IOMSG_LEN. (nml_read_obj): Add user defined error message. * io/transfer.c (unformatted_read): Add user defined error message. (unformatted_write): Add user defined error message. (formatted_transfer_scalar_read): Add user defined error message. (formatted_transfer_scalar_write): Add user defined error message. * io/write.c (list_formatted_write_scalar): Add user defined error message. (nml_write_obj): Add user defined error message. gcc/testsuite/ChangeLog: * gfortran.dg/pr105456-nmlr.f90: New test. * gfortran.dg/pr105456-nmlw.f90: New test. * gfortran.dg/pr105456-ruf.f90: New test. * gfortran.dg/pr105456-wf.f90: New test. * gfortran.dg/pr105456-wuf.f90: New test. Diff: --- gcc/testsuite/gfortran.dg/pr105456-nmlr.f90 | 60 + gcc/testsuite/gfortran.dg/pr105456-nmlw.f90 | 60 + gcc/testsuite/gfortran.dg/pr105456-ruf.f90 | 36 + gcc/testsuite/gfortran.dg/pr105456-wf.f90 | 34 gcc/testsuite/gfortran.dg/pr105456-wuf.f90 | 34 libgfortran/io/io.h | 7 ++- libgfortran/io/list_read.c | 81 +++-- libgfortran/io/transfer.c | 49 + libgfortran/io/write.c | 26 + 9 files changed, 343 insertions(+), 44 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/pr105456-nmlr.f90 b/gcc/testsuite/gfortran.dg/pr105456-nmlr.f90 new file mode 100644 index 000..5ce5d082133 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr105456-nmlr.f90 @@ -0,0 +1,60 @@ +! { dg-do run } +! { dg-shouldfail "The users message" } +module m + implicit none + type :: t +character :: c +integer :: k + contains +procedure :: write_formatted +generic :: write(formatted) => write_formatted +procedure :: read_formatted +generic :: read(formatted) => read_formatted + end type +contains + subroutine write_formatted(dtv, unit, iotype, v_list, iostat, iomsg) +class(t), intent(in) :: dtv +integer, intent(in) :: unit +character(*), intent(in) :: iotype +integer, intent(in) :: v_list(:) +integer, intent(out) :: iostat +character(*), intent(inout) :: iomsg +if (iotype.eq."NAMELIST") then + write (unit, '(a1,a1,i3)') dtv%c,',', dtv%k +else + write (unit,*) dtv%c, dtv%k +end if + end subroutine + subroutine read_formatted(dtv, unit, iotype, v_list, iostat, iomsg) +class(t), intent(inout) :: dtv +integer, intent(in) :: unit +character(*), intent(in) :: iotype +integer, intent(in) :: v_list(:) +integer, intent(out) :: iostat +character(*), intent(inout) :: iomsg +character :: comma +if (iotype.eq."NAMELIST") then + read (unit, '(a1,a1,i3)') dtv%c, comma, dtv%k +else + read (unit,*) dtv%c, comma, dtv%k +endif +iostat = 42 +iomsg = "The users message" +if (comma /= ',') STOP 1 + end subroutine +end module + +program p + use m + implicit none + character(len=50) :: buffer + type(t) :: x + namelist /nml/ x + x = t('a', 5) + write (buffer, nml) + if (buffer.ne.' &NML X=a, 5 /') STOP 1 + x = t('x', 0) + read (buffer, nml) + if (x%c.ne.'a'.or. x%k.ne.5) STOP 2 +end +! { dg-output "Fortran runtime error: The users message" } diff --git a/gcc/testsuite/gfortran.dg/pr105456-nmlw.f90 b/gcc/testsuite/gfortran.dg/pr105456-nmlw.f90 new file mode 100644 index 000..2c496e611f4 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr105456-nmlw.f90 @@ -0,0 +1,60 @@ +! { dg-do run } +! { dg-shouldfail "The users message" } +module m + implicit none + type :: t +character :: c +integer :: k + contains +procedure :: wri