Thanks for the reply.
* PyTorch -> Relay -> Ansor -> TVM's low-level code -> LLVM/NVCC (LLVM was used
above)
* Both CPU and GPU (in particular, NVIDIA T4)
---
[Visit Topic](https://discuss.tvm.apache.org/t/quantized-transformer/11850/3)
to respond.
You are receiving this because you enab
First of all, Ansor is no good for int8, since it cannot use fast int8 hardware
(VNNI, tensorcore) at all.
* How are you quantizing the model?
* What backends are you interested in? CPU or GPU?
---
[Visit Topic](https://discuss.tvm.apache.org/t/quantized-transformer/11850/2)
to respond.
Hi,
I'm trying to use TVM's stack to deploy INT8-quantized Transformer-based models.
I tried Relay + Ansor(AutoScheduler) for a Transformer (# layers = 1) and the
results weren't so neat.
|Time (ms)|Original|Quantized|
| --- | --- | --- |
|PyTorch|20|--|
|TVM (Relay, optimized)|130|120|