[ [Torch, QNN] Add support for quantized models via QNN 
#4977](https://github.com/apache/incubator-tvm/pull/4977) gives performance of 
quantized Torch models and converted tvm quantized model,but did not give the 
speed comparison betweem.
Where could I get more speed compason of two kinds of quantation:
1. converting quantized models from torch to Relay via QNN(as #4977 said)
2. TVM int8 quantization and TVM int8 quantization + AutoTVM(as [int8 is 
slower](https://discuss.tvm.ai/t/the-inference-time-is-longer-after-int8-quantization/3628/3)
 said)

I am not sure why int8 is slower than float32,and I find many people get the 
same slow result,but it is very hard to find official tutorial about how to do 
quantization for pytorch or tf correctiy, or official speed result about 
quantization,or see any normal users said that they successed to convert py/tf 
models in accuracy and get wanted faster speed.

Thanks for your kind help!





---
[Visit 
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/821c50d501534cd1c4a283bb92650ff66663d5381b2a62965fcc23f629d82a6a).

Reply via email to