fhahn added a comment. Thanks for putting up the patch!
Do you think it would be possible to get the desired behavior without a new builtin? We should be able to combine the add with the initial multiply for each vector, as long as we have the right fast-math flags? IIUC reassociate should be enough. So perhaps it would be possible to perform this optimization in `LowerMatrixIntrinsics` directly. The user should then be able to use to enable the right fast-math flags locally using `pragma clang fp`, like below. Clang first needs to be updated to handle those pragmas properly for the matrix types. #pragma clang fp reassociate(on) C = A*B + C; Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99433/new/ https://reviews.llvm.org/D99433 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits