Yes,thanks again for your reply.
I just verified [ tutorial_eager.py](https://github.com/Edgecortix-Inc/pytorch_quantization/blob/master/tutorial_eager.py) @torch-nightly(v1.6) @macbook pro, and get the 2-4x speed-up as the [static_quantization_tutorial](https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html#quantization-aware-training) gives. However, the top-1 accuracy drops about 4 point for per_channel_quantized_model, so I will try QAT in Torch to get extra accuracy following your advice. 1. **Deploy a Quantized Model on Cuda** is executed on cuda, dose it support run inference on cpu like mbp or android? 2. Which way do you recommend for high-accuracy and big-speed-up quantization, between **TVM converting quantized torch model** and **TVM quantization independent of frameworks(e.g. tf, pytroch, and so on)** ? 3. Have you ever evaluate the accuracy and speed-up of the two way? 4. Which might be better or reasonable for tvm's development in the future from the perspective of TVM framework design Please excuse my lots of question, hope to discuss tvm with you smart and kind-hearted guys. --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/10) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/a9607f4ade85c5c91fa59713f6d77286c36eb3ad6a3195a52547c2d75f98d0ea).