everton.constantino added a comment. @fhahn Ok I see what you mean now, this sounds like a doable path and might be able to cover architectures with specialized matrix multiplication instructions as well .
Just to see if I understand correctly I can add a matrix_add intrinsic, do a travesal looking for matrix_multiply and fuse both changing `LowerMatrixMultiplyFused` to support pre-loading the accumulator. Is that correct? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99433/new/ https://reviews.llvm.org/D99433 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits