> Awesome solution! Just curios: for shapes which are worse than cudnn/cublas, 
> what kind of tuning is using?

Good point! We do have some internal discussions about whether we need to 
automatically search the schedule space based on performance between TensorCore 
and non-TensorCore kernel, since TensorCore implementation may not beat the 
non-TensorCore version for every shapes. This is one of the plan-to-do features 
and any further comments and inputs are also welcome. One possible solution is 
to expose TensorCore as another schedule configuration knob to let auto-tuner 
decide whether we need to turn it on or not. Another potential solution is that 
in the IR pass we decide on whether a certain shape may perform better with 
TensorCore with heuristics. There are pros and cons with both solution. For the 
former one, the tuner space will be enlarged, thus bringing a little bit larger 
tuning space. For the latter one, since we make decision in the IR pass 
internally, the tuner space is kept almost the same however introduce 
dependency upon the accuracy of the heuristics, although for TensorCore due it 
is hardware nature we think it might be clear to decide whether a shape is 
performance friendly for TensorCore or not, there is still possibility that we 
may choose a low-performance kernel. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4105#issuecomment-541121603

Reply via email to