Yes,thanks again for your reply.

I just verified [ 
tutorial_eager.py](https://github.com/Edgecortix-Inc/pytorch_quantization/blob/master/tutorial_eager.py)
 @torch-nightly(v1.6) @macbook pro, and get the 2-4x speed-up as the 
[static_quantization_tutorial](https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html#quantization-aware-training)
 gives. However, the top-1 accuracy drops about 4 point for 
per_channel_quantized_model, so I will try QAT in Torch to get extra accuracy 
following your advice.

1. **Deploy a Quantized Model on Cuda** is executed on cuda, dose it support 
run inference on cpu like mbp or android?
2. Which way do you recommend for high-accuracy and big-speed-up quantization, 
between **TVM converting quantized torch model** and **TVM quantization 
independent of frameworks(e.g. tf, pytroch, and so on)** ? 
3. Have you ever evaluate the accuracy and speed-up of the two way? 
4. Which might be better or reasonable for tvm's development in the future from 
the perspective of TVM framework design

Please excuse my lots of question, hope to discuss tvm with you smart  and 
kind-hearted guys.





---
[Visit 
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/10)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/a9607f4ade85c5c91fa59713f6d77286c36eb3ad6a3195a52547c2d75f98d0ea).

Reply via email to