fhahn added a comment. In D99433#2653586 <https://reviews.llvm.org/D99433#2653586>, @everton.constantino wrote:
> @fhahn That was my first idea however its not as simple as it looks. I tried > moving the adds but splats make it considerably harder to find a pattern that > catches this and fuses the multiplies specially with bigger matrices. My real > wish was to actually add a new IR instruction to handle matrices because the > MADD is but a simple example of other more interesting optimizations that can > be done, like using matrix associative properties to reduce the number of > calculations. I found that path too complicated however and I opted for a > compromise at the moment. I wish to start writing some GEMM micro-kernels > with this extension and this builtin was the shortest path. Could you elaborate on the splats that make this tricky? Before the matrix lowering, there should be no splats: https://godbolt.org/z/r941xsc6b. I was thinking of detecting the multiply/add before we do the actual lowering, e.g. like it is already done for `{load, load} ->multiply->store` chains in `LowerMatrixMultiplyFused ` https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp#L1346 Before the matrix lowering, there should be no splats: https://godbolt.org/z/r941xsc6b It might still be convenient to have a separate multiply-add intrinsic for matrixes, because then we could just replace `fadd( @matrix.multiply() , X)` before lowering. But I am not sure how scalable this will be (I don't think we want too many intrinsics), so perhaps we could keep track of bundles of instructions to lower together in general. But I don't think we need this for the initial optimization to start with. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99433/new/ https://reviews.llvm.org/D99433 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits