Thanks @tqchen and @Hzfengsy for your valuable feedbacks. We are trying out some of your suggestions. Will have further discussions with you after we have made some evaluations and trials.
> As we know using TensorCores will decrease precision. So, NVIDIA set up a > switch to turn on and off TensorCores in CUBLAS and CUDNN (default not use > TensorCores). At least we should let users determine whether use them. I doubt whether "using TensorCores will decrease precision", if the inputs are already in fp16 or int8. We did try to add an "enable_tensor_core" option in tvm.build_config, but it seems like build_config can't be passed to AutoTVM building. Any suggestion on where to add this option is welcome. But I think eventually we will not need this option, after the implementation is proven to be robust enough. For example, in Tensorflow, MatMul/Conv on fp16 data by default uses TensorCore Kernel of cublas/cudnn. > In Volta Arichitecture Whitepaper, TensorCores do production in full > precision, rather than half precision. I recommend changing the pattern into > A/B -> Load -> Cast -> Mul -> Add if we still use pattern matching solution. Thanks for correcting my understanding. So it seems like the tensorcore operation is more like *c = float(a)\*float(b) + c* than *c = float(a\*b) + c* -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4105#issuecomment-541282259