[PATCH] D99433: [Matrix] Including __builtin_matrix_multiply_add for the matrix type extension.

Florian Hahn via Phabricator via cfe-commits Wed, 31 Mar 2021 07:30:08 -0700

fhahn added a comment.

In D99433#2661357 <https://reviews.llvm.org/D99433#2661357>, 
@everton.constantino wrote:


> @fhahn When I mentioned the splats I was talking about the IR, not the final 
> code. On the Godbolts links you sent, its the same that I see. However take a 
> look into the IR your example generates:

Sorry for not being clearer. I meant the IR *before* LowerMatrixIntrinisics is 
run (which should be on the righthand side of the Godbolt view). I'm also 
posting it below. Unless I am missing something, we should be able to easily 
match `fadd (llvm.matrix.multiply(A, B), C) ` before the actual lowering of 
`llvm.matrix.multiply`. I think we do something similar already for combing 
load->multiply->store chains: 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp#L703
 . Basically try to fuse all multiplies before the 'normal' lowering. Would it 
be possible to deal with  `fadd (llvm.matrix.multiply(A, B), C) ` similarly?

  lang-13: warning: argument unused during compilation: 
'--gcc-toolchain=/opt/compiler-explorer/gcc-snapshot' 
[-Wunused-command-line-argument]
  *** IR Dump Before Lower the matrix intrinsics (lower-matrix-intrinsics) ***
  ; Function Attrs: nofree nounwind uwtable willreturn mustprogress
  define dso_local void @_Z3fooRu11matrix_typeILm2ELm2EfES0_S0_([4 x float]* 
nocapture nonnull readonly align 4 dereferenceable(16) %0, [4 x float]* 
nocapture nonnull align 4 dereferenceable(16) %1, [4 x float]* nocapture 
nonnull readonly align 4 dereferenceable(16) %2) local_unnamed_addr #0 {
    %4 = bitcast [4 x float]* %0 to <4 x float>*
    %5 = load <4 x float>, <4 x float>* %4, align 4, !tbaa !6
    %6 = bitcast [4 x float]* %2 to <4 x float>*
    %7 = load <4 x float>, <4 x float>* %6, align 4, !tbaa !6
    %8 = tail call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x 
float> %5, <4 x float> %7, i32 2, i32 2, i32 2)
    %9 = bitcast [4 x float]* %1 to <4 x float>*
    %10 = load <4 x float>, <4 x float>* %9, align 4, !tbaa !6
    %11 = fadd <4 x float> %8, %10
    store <4 x float> %11, <4 x float>* %9, align 4, !tbaa !6
    ret void
  }


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99433/new/

https://reviews.llvm.org/D99433

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99433: [Matrix] Including __builtin_matrix_multiply_add for the matrix type extension.

Reply via email to