fhahn added a comment.

Thanks for putting up the patch!

Do you think it would be possible to get the desired behavior without a new 
builtin? We should be able to combine the add with the initial multiply for 
each vector, as long as we have the right fast-math flags? IIUC reassociate 
should be enough.  So perhaps it would be possible to perform this optimization 
in `LowerMatrixIntrinsics` directly.  The user should then be able to use to 
enable the right fast-math flags locally using `pragma clang fp`, like below. 
Clang first needs to be updated to handle those pragmas properly for the matrix 
types.

  #pragma clang fp reassociate(on)
  C = A*B + C;


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99433/new/

https://reviews.llvm.org/D99433

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to