I have chatted with @minminsun and his team these days. Just as then mentioned https://github.com/dmlc/tvm/issues/4105#issuecomment-542032766. We can have different frontends but only one backend. In my previous implement, users can only use fragments with 16x16x16 shape and row-major layout. To solve this problem, Minmin uses `new_expr`. Here I proposed a new design here. We use attributes to transmit metadata. Here is an example: ``` // attr [A.shared.wmma.matrix_a] storage_scope = "wmma.matrix_a" // attr [A.shared.wmma.matrix_a] fragment_layout = "row_major" // attr [A.shared.wmma.matrix_a] fragment_shape = "16, 16, 16" allocate A.shared.wmma.matrix_a[float16 * 1024] ``` This sulotion has been accepted by Minmin and his team. Thanks for supporting my propsal. Users can set these configuration in tensor intrinsics ``` Python def intrin_wmma_load_matrix(scope): n = 16 A = tvm.placeholder((n, n), name='A', dtype='float16') BA = tvm.decl_buffer(A.shape, A.dtype, scope='shared', data_alignment=32, offset_factor=256) C = tvm.compute((n, n), lambda i, j: A[i, j], name='C') BC = tvm.decl_buffer(C.shape, C.dtype, scope=scope, data_alignment=32, offset_factor=256)
def intrin_func(ins, outs): ib = tvm.ir_builder.create() BA = ins[0] BC = outs[0] # shape (n, n, n) and 'row_major' layout ib.emit(tvm.call_intrin('handle', 'tvm_load_matrix_sync', BC.data, n, n, n, BC.elem_offset // 256, BA.access_ptr('r'), n, 'row_major')) return ib.get() return tvm.decl_tensor_intrin(C.op, intrin_func, binds={A: BA, C: BC}) ``` There is a new IR_pass will transmit these information from intrinsic to attributes. I am really happy to cooperate with Minmin's team. Thank you again for contributing to TVM. cc @tqchen -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4052#issuecomment-542488197