Thanks for the RFC, also cross link  to https://github.com/dmlc/tvm/issues/4052.

## Non standard buffer allocation

We are moving toward using special memory scopes to annotate the special 
memory(e.g. mma). The use of ```new_expr``` was convenient, but never the less 
a bit too close to low level and overlaps with what we can do with special 
memory scope.  Adding ```new_expr``` to Realize seems to enforce that decision 
even earlier, which I would not recommend.

Here is an alternative solution: introduce a new scope for the special memory 
needed for lowering, then the special rule can be used to generate the 
corresponding memory needed.  Of course there could be additional hints that 
are needed to lower the the allocation code, you can likely embed that 
additional information with a special AttrStmt outside the allocation scope.

## Place of Pattern Matching 

Right now from the reading of RFC, seems the early pattern matching was done 
before flattening and was dependent on the compute structure.

I wonder if we could de-couple this, with some annotations, run some of the 
rewriting after storage flatten. Of course the low-level code does not enjoy 
the benefit of the multi-dimension indices, but the access pattern can still be 
detected by DetectLinearEquation.

One possible limitation I see the current approach is that whether we could 
support operations like conv2d, as we will need to explicitly express compute 
in this form(which is fine for now).

## Complement and Combine with Tensor Intrinsics based TensorCore support

It would be great to hear from more thoughts @Hzfengsy @minminsun about how can 
we combine the tensor intrinsics based approach with the more automatic pattern 
detector one. e.g 
 https://github.com/dmlc/tvm/issues/4052.

We always tries to have a philosophy to enable the manual scheduling options 
that can gives us a way to specify search space, then build automation on top. 
This allows us to takes a spectrum of approach, use more manual one if 
necessary, and build more diverse automated solution.

Our eventual goal would still be unify all tensorization support under tensor 
intrinsics, and build automation on top. One idea would be we still declare the 
lowering rules via tensor intrinsics, but reuses the pattern matching 
techniques in this RFC to rewrite to hints that applies the tensor intrinsics. 
This way we can organically combine the two ideas together.




-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4105#issuecomment-541141436

Reply via email to