On Mon, 2020-06-15 at 14:58 -0500, Peter Bergner via Gcc-patches wrote: > This patches adds the actual MMA built-ins. The MMA accumulators are > INOUT > operands for most MMA instructions, but they are also very expensive > to > move around. For this reason, we have implemented a built-in API > where the accumulators are passed using pass-by-reference/pointers, > so > the user won't use one accumulator as input and another as output, > which would entail a lot of copies. However, using pointers gives us > poor code generation when we expand the built-ins at normal expand > time. > We therefore expand the MMA built-ins early into gimple, converting > the pass-by-reference calls to an internal built-in that uses pass- > by-value > calling convention, where we can enforce the input and output > accumulators > are the same. This gives us much better code generation. > > The associated test cases for these built-ins are in patch3. > > This patch plus patch1 passed bootstrap and regtesting with no > regressions > on both powerpc64le-linux and powerpc64-linux. Ok for trunk? > > Peter > > 2020-06-15 Peter Bergner <berg...@linux.ibm.com> > > gcc/ > * config/rs6000/predicates.md (mma_input_operand): New > predicate. > * config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2, > BU_MMA_3, > BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining > MMA > built-in functions. > (ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC, > DISASSEMBLE_PAIR, > PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN, > PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP, > PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN, > PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER, > PMXVF64GERNN, > PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2, > PMXVI16GER2PP, > PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP, > PMXVI8GER4, > PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN, > XVBF16GER2NP, > XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2, > XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER, > XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER, > XVF64GERNN, > XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP, > XVI16GER2S, > XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP, > XVI8GER4SPP, > XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins.
checked noses, all have been found below. > * config/rs6000/rs6000.c (rs6000_emit_move): Allow zero > constants. > (print_operand) <case 'A'>: New output modifier. > (rs6000_split_multireg_move): Add support for inserting > accumulator > priming and depriming instructions. Add support for splitting > an > assemble accumulator pattern. > * config/rs6000/rs6000-call.c (mma_init_builtins, > mma_expand_builtin, > rs6000_gimple_fold_mma_builtin): New functions. > (RS6000_BUILTIN_M): New macro. > (def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR > attributes. > (bdesc_mma): Add new MMA built-in support. > (htm_expand_builtin): Use RS6000_BTC_OPND_MASK. > (rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and > RS6000_BTM_MMA. > (rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID > attribute. > (rs6000_gimple_fold_builtin): Call > rs6000_builtin_is_supported_p > and rs6000_gimple_fold_mma_builtin. > (rs6000_expand_builtin): Call mma_expand_builtin. > Use RS6000_BTC_OPND_MASK. > (rs6000_init_builtins): Adjust comment. Call > mma_init_builtins. > (htm_init_builtins): Use RS6000_BTC_OPND_MASK. > (builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and > VSX_BUILTIN_XVCVBF16SP. > * config/rs6000/rs6000.h (RS6000_BTC_QUINARY, > RS6000_BTC_SENARY, > RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR, > RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines. > (RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST, > RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values. > * config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant. > (UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2, > UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP, > UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP, > UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN, > UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN, > UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER, > UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP, > UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP, > UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN, > UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN, > UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2, > UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S, > UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8, > UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4, > UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP, > UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN, > UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN, > UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2, > UNSPEC_MMA_XVF16GER2NN, > UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN, > UNSPEC_MMA_XVF16GER2PP, > UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN, > UNSPEC_MMA_XVF32GERNP, > UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP, > UNSPEC_MMA_XVF64GER, > UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP, > UNSPEC_MMA_XVF64GERPN, > UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2, > UNSPEC_MMA_XVI16GER2PP, > UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP, > UNSPEC_MMA_XVI4GER8, > UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4, > UNSPEC_MMA_XVI8GER4PP, > UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC, > UNSPEC_MMA_XXMTACC): New. ok > (MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8, > MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4, > MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4, > MMA_AVVI4I4I4): New define_int_iterator. > (acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2, > avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4, > avvi4i4i4): New define_int_attr. > (*movpxi): Add zero constant alternative. > (mma_assemble_pair, mma_assemble_acc): New define_expand. > (*mma_assemble_acc): New define_insn_and_split. > (mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>, > mma_<apv>, > mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, > mma_<avvi4i4i2>, > mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>, > mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn. > * config/rs6000/rs6000.md ('type' attribute): Add mma type. (mma) : New 'type' attribute. > * config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New. > (UNSPEC_VSX_XVCVSPBF16): Likewise. > (XVCVBF16): New define_int_iterator. > (xvcvbf16): New define_int_attr. > (vsx_<xvcvbf16>): New define_insn. > * doc/extend.texi: Document the mma built-ins. > I've read through the rest of this patch. nothing else jumps out at me. Thanks, -Will <snip>