Thanks for the RFC, also cross link to https://github.com/dmlc/tvm/issues/4052.
## Non standard buffer allocation We are moving toward using special memory scopes to annotate the special memory(e.g. mma). The use of ```new_expr``` was convenient, but never the less a bit too close to low level and overlaps with what we can do with special memory scope. Adding ```new_expr``` to Realize seems to enforce that decision even earlier, which I would not recommend. Here is an alternative solution: introduce a new scope for the special memory needed for lowering, then the special rule can be used to generate the corresponding memory needed. Of course there could be additional hints that are needed to lower the the allocation code, you can likely embed that additional information with a special AttrStmt outside the allocation scope. ## Place of Pattern Matching Right now from the reading of RFC, seems the early pattern matching was done before flattening and was dependent on the compute structure. I wonder if we could de-couple this, with some annotations, run some of the rewriting after storage flatten. Of course the low-level code does not enjoy the benefit of the multi-dimension indices, but the access pattern can still be detected by DetectLinearEquation. One possible limitation I see the current approach is that whether we could support operations like conv2d, as we will need to explicitly express compute in this form(which is fine for now). ## Complement and Combine with Tensor Intrinsics based TensorCore support It would be great to hear from more thoughts @Hzfengsy @minminsun about how can we combine the tensor intrinsics based approach with the more automatic pattern detector one. e.g https://github.com/dmlc/tvm/issues/4052. We always tries to have a philosophy to enable the manual scheduling options that can gives us a way to specify search space, then build automation on top. This allows us to takes a spectrum of approach, use more manual one if necessary, and build more diverse automated solution. Our eventual goal would still be unify all tensorization support under tensor intrinsics, and build automation on top. One idea would be we still declare the lowering rules via tensor intrinsics, but reuses the pattern matching techniques in this RFC to rewrite to hints that applies the tensor intrinsics. This way we can organically combine the two ideas together. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4105#issuecomment-541141436