> Awesome solution! Just curios: for shapes which are worse than cudnn/cublas, > what kind of tuning is using?
Good point! We do have some internal discussions about whether we need to automatically search the schedule space based on performance between TensorCore and non-TensorCore kernel, since TensorCore implementation may not beat the non-TensorCore version for every shapes. This is one of the plan-to-do features and any further comments and inputs are also welcome. One possible solution is to expose TensorCore as another schedule configuration knob to let auto-tuner decide whether we need to turn it on or not. Another potential solution is that in the IR pass we decide on whether a certain shape may perform better with TensorCore with heuristics. There are pros and cons with both solution. For the former one, the tuner space will be enlarged, thus bringing a little bit larger tuning space. For the latter one, since we make decision in the IR pass internally, the tuner space is kept almost the same however introduce dependency upon the accuracy of the heuristics, although for TensorCore due it is hardware nature we think it might be clear to decide whether a shape is performance friendly for TensorCore or not, there is still possibility that we may choose a low-performance kernel. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4105#issuecomment-541121603