@yangjunpro Really happy to see another solution for TensorCore. 

You are right! I just extend tvm intrinsic to support it. It does cause 
programmers who write the schedule some trouble. It is not easy to write a 
high-performance schedule.

I'm really curious about how to use IR passes to recognize the pattern. Does it 
need to split into several loops of 16 in python code? I appreciate it if you 
can show me some details and simple examples

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4052#issuecomment-537821079

Reply via email to