[Apache TVM Discuss] [Questions] Quantized Transformer

2022-01-07 Thread Jason Huh via Apache TVM Discuss
Thanks for the reply. * PyTorch -> Relay -> Ansor -> TVM's low-level code -> LLVM/NVCC (LLVM was used above) * Both CPU and GPU (in particular, NVIDIA T4) --- [Visit Topic](https://discuss.tvm.apache.org/t/quantized-transformer/11850/3) to respond. You are receiving this because you enab

[Apache TVM Discuss] [Questions] Quantized Transformer

2022-01-05 Thread masahi via Apache TVM Discuss
First of all, Ansor is no good for int8, since it cannot use fast int8 hardware (VNNI, tensorcore) at all. * How are you quantizing the model? * What backends are you interested in? CPU or GPU? --- [Visit Topic](https://discuss.tvm.apache.org/t/quantized-transformer/11850/2) to respond.

[Apache TVM Discuss] [Questions] Quantized Transformer

2022-01-05 Thread Jason Huh via Apache TVM Discuss
Hi, I'm trying to use TVM's stack to deploy INT8-quantized Transformer-based models. I tried Relay + Ansor(AutoScheduler) for a Transformer (# layers = 1) and the results weren't so neat. |Time (ms)|Original|Quantized| | --- | --- | --- | |PyTorch|20|--| |TVM (Relay, optimized)|130|120|