I'd like to ping the following MMA patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578288.html
Message-ID: <8393a33f-50ab-6720-0017-3f012803b...@linux.ibm.com> Peter Fwprop will happily optimize two xxsetaccz instructions into one xxsetaccz by propagating the results of the first to the uses of the second. We really don't want that to happen given the late priming/depriming of accumulators. I fixed this by making the xxsetaccz source operand an unspec volatile. I also removed the mma_xxsetaccz define_expand and define_insn_and_split and replaced it with a simple define_insn. The expand and splitter patterns were leftovers from the pre opaque mode code when the xxsetaccz code was part of the movpxi pattern, and we don't need them now. Rather than a new test case, I was able to just modify the current test case to add another __builtin_mma_xxsetaccz call which shows the bad code gen with unpatched compilers. This passed bootstrap on powerpc64le-linux with no regressions. Ok for trunk? We'll need this for sure in GCC11. Ok there too after some trunk burn in time? GCC10 suffers from the same issue, but since the code is different, I'll have to determine a different solution which I'll post as a separate patch. gcc/ * config/rs6000/mma.md (unspec): Delete UNSPEC_MMA_XXSETACCZ. (unspecv): Add UNSPECV_MMA_XXSETACCZ. (*mma_xxsetaccz): Delete. (mma_xxsetaccz): Change to define_insn. Remove match_operand. Use UNSPECV_MMA_XXSETACCZ. * config/rs6000/rs6000.c (rs6000_rtx_costs): Use UNSPECV_MMA_XXSETACCZ. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-6.c: Add second call to xxsetacc built-in. Update instruction counts.