Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread 孙敏敏
Thanks @tqchen and @Hzfengsy for your valuable feedbacks. We are trying out some of your suggestions. Will have further discussions with you after we have made some evaluations and trials. > As we know using TensorCores will decrease precision. So, NVIDIA set up a > switch to turn on and off Te

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Orion34C
> * It shocks me that your solution is even faster than CUBLAS and CUDNN. I try > to reproduce the result but fails. Did you use BatchMatMul and BatchConv? And > which GPU did you test on? Could you show me the details about the > performance? > Our fp16 TensorCore kernel are tuned on V100 with

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Andrew Tulloch
This is really impressive work, congrats! -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4105#issuecomment-541259191

Re: [dmlc/tvm] [RFC] Add AVX512VNNI support for TVM (#3388)

2019-10-11 Thread Jianyu Huang
> Hi @jianyuh I am getting following error when I try to run my benchmark. It > gives following error, > > ``` > LLVM ERROR: Cannot select: 0x23809ef0: v16i32 = X86ISD::VPDPBUSD 0x210a09a8, > 0x210a02c0, 0x19eb81b0 > 0x210a09a8: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, > Consta

[dmlc/tvm] [COMMUNITY] soiferj -> reviewer (#4108)

2019-10-11 Thread Thierry Moreau
Please join me in welcoming @soiferj as a new reviewer of the Apache TVM project. Jon has extended TOPI with new operators, extended coverage of ONNX and TensorFlow Relay frontends, added IR passes to combine dense ops in parallel among other contributions. - [Commits](https://github.com/dmlc/

Re: [dmlc/tvm] [RFC] Add AVX512VNNI support for TVM (#3388)

2019-10-11 Thread Animesh Jain
Hi @jianyuh I am getting following error when I try to run my benchmark. It gives following error, ~~~ LLVM ERROR: Cannot select: 0x23809ef0: v16i32 = X86ISD::VPDPBUSD 0x210a09a8, 0x210a02c0, 0x19eb81b0 0x210a09a8: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Cons

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Siyuan Feng
Thank you for the RFC. It is complete TensorCore support. It is nice that you can support different types and different data layouts, which is not supported in my solution currently. ## Lower Passes vs Intrinsic Intrinsic is a tool for describing what instructions can be done in specific hardwa

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Tianqi Chen
Thanks for the RFC, also cross link to https://github.com/dmlc/tvm/issues/4052. ## Non standard buffer allocation We are moving toward using special memory scopes to annotate the special memory(e.g. mma). The use of ```new_expr``` was convenient, but never the less a bit too close to low level

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Jun Yang
> Awesome solution! Just curios: for shapes which are worse than cudnn/cublas, > what kind of tuning is using? Good point! We do have some internal discussions about whether we need to automatically search the schedule space based on performance between TensorCore and non-TensorCore kernel, sin

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread 孙敏敏
> Awesome solution! Just curios: for shapes which are worse than cudnn/cublas, > what kind of tuning is using? We haven’t spent much effort on performance tuning yet. For cases with bad performance we plan to do profiling to figure out the causes firstly. One possible way of optimization is to m

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Bing Xu
Awesome solution! Just curios: for shapes which are worse than cudnn/cublas, what kind of tuning is using? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4105#issuecomment-541014088

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread 孙敏敏
#4052 @Hzfengsy -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4105#issuecomment-540978699

[dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread 孙敏敏
We propose a solution for TensorCore CodeGen with significant transparency, flexibility and usability. In this solution, the algorithm description and schedule of TensorCore CodeGen is no different than that of a normal CUDA CodeGen. All the information needed by wmma API, such as matrix_a/matr