[ [Torch, QNN] Add support for quantized models via QNN #4977](https://github.com/apache/incubator-tvm/pull/4977) gives performance of quantized Torch models and converted tvm quantized model,but did not give the speed comparison betweem. Where could I get more speed compason of two kinds of quantation: 1. converting quantized models from torch to Relay via QNN(as #4977 said) 2. TVM int8 quantization and TVM int8 quantization + AutoTVM(as [int8 is slower](https://discuss.tvm.ai/t/the-inference-time-is-longer-after-int8-quantization/3628/3) said)
I am not sure why int8 is slower than float32,and I find many people get the same slow result,but it is very hard to find official tutorial about how to do quantization for pytorch or tf correctiy, or official speed result about quantization,or see any normal users said that they successed to convert py/tf models in accuracy and get wanted faster speed. Thanks for your kind help! --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/821c50d501534cd1c4a283bb92650ff66663d5381b2a62965fcc23f629d82a6a).