This patch was posted a year or so during the GCC 14 patches, and I'm posting it again with the hopes that I can get this into GCC 15. In the GCC 14 time frame, 1,024 bit registers were not supported due to the bit length in internal structures. In GCC 15, 1,024 bit registers are now supported.
Note, these patches are for a potential future PowerPC. They are not targeted towards a specific CPU, and they may change if/when a PowerPC with this instruction set is released. The main motivation is to get in support for the 1,024 bit dense math registers into the current GCC. In the current power10 hardware, the 8 512-bit accumulator registers overlap with the VSX registers 0..31. If dense math register support is added in a future machine, these registers will become separate registers. The current instructions will work, using these new registers. If you use existing code, the VSX registers that currently overlap with the accumulators will not be used, and instead the separate dense math registers will be used. One of the important changes in these patches is to add a new constraint ('wD'). When code is compiled for the power10, 'wD' will match the VSX registers 0..31 (i.e. the traditional floating point registers). When code is compiled for the potential future machine, 'wD' will match the new separate dense math registers. Thus for __asm__ code that uses the accumulator registers, the code should change 'd' constraints to 'wD'. The intention is that user code using extended asm can be modified to run on both MMA without dense math and MMA with dense math: 1) If possible, don't use extended asm, but instead use the MMA built-in functions; 2) If you do need to write extended asm, change the d constraints targetting accumulators should now use wD when using GCC 15 or later; 3) Only use the built-in zero, assemble and disassemble functions create move data between vector quad types and dense math accumulators. I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the extended asm code. The reason is these instructions assume there is a 1-to-1 correspondence between 4 adjacent FPR registers and an accumulator that overlaps with those instructions. With accumulators now being separate registers, there no longer is a 1-to-1 correspondence. This patch assumes the 11 patches that were posted on October 25th that separate the ISA flags bits from the architecture bits and that adds the -mcpu=future option have been applied. If those patches are rejected, I would need to modify these patches to add an undocumented '-mfuture' option that would be set for dense math generation. * https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666529.html There are 6 patches in this patch set: Patch #1 enables using the vector pair load and store instructions when generating memory copy operations. Patch #2 adds the 'wD' constraint, and modifies the mma code to use 'wD' instead of 'd' or 'f'. Patch #3 adds support for separate dense math registers if -mcpu=future. This support keeps the register size to be 512 bits, issuing the instructions that are common between the power10 MMA instruction set and the future dense math instruction set. Patch #4 changes the assembler instruction names from the original MMA instructions to the newer mnemonics for dense math instructions when -mcpu=future is used. The GAS assembler will issue the same bit pattern for the old name and the new name. Patch #5 adds a test for dense match support. Patch #6 adds support for the dense math instructions that use 1,024 bit registers. This patch adds a new keyword ('__dmr') for the 1,024 bit dense math registers. A new mode (TDOmode) is added for 1,024 bit registers. Only the register support is added in this patch. Assuming these 6 patches go in, future patches will provide new built-in functions to issue the new instructions. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com