Thanks @tqchen and @Hzfengsy for your valuable feedbacks. We are trying out 
some of your suggestions. Will have further discussions with you after we have 
made some evaluations and trials.

> As we know using TensorCores will decrease precision. So, NVIDIA set up a 
> switch to turn on and off TensorCores in CUBLAS and CUDNN (default not use 
> TensorCores). At least we should let users determine whether use them.

I doubt whether "using TensorCores will decrease precision", if the inputs are 
already in fp16 or int8. We did try to add an "enable_tensor_core" option in 
tvm.build_config, but it seems like build_config can't be passed to AutoTVM 
building. Any suggestion on where to add this option is welcome. But I think 
eventually we will not need this option, after the implementation is proven to 
be robust enough. For example, in Tensorflow, MatMul/Conv on fp16 data by 
default uses TensorCore Kernel of cublas/cudnn.

> In Volta Arichitecture Whitepaper, TensorCores do production in full 
> precision, rather than half precision. I recommend changing the pattern into 
> A/B -> Load -> Cast -> Mul -> Add if we still use pattern matching solution.

Thanks for correcting my understanding. So it seems like the tensorcore 
operation is more like *c = float(a)\*float(b) + c* than *c = float(a\*b) + c*

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4105#issuecomment-541282259

Reply via email to