@anijain2305

 thanks a lot.

I thought the tvm relay quantize is the same as tvm model converted from 
pre-quantized.

I also test tvm-int8 model from pytorch qat model, the speed is the same as 
tvm-relay-quantize-int8 model.

I really have no idea how to get 1.3x -1.5x speedup no matter 
pre-quantize-int8-model or tvm-relay-quantize-int8-model. I m eager for your 
kind help to reproduce the speedup on android arm device.

Nice to see your effort for quantize tutorial!

I also recommend that you give more tutorials on how to get desired int8 
speedup compared fp32 on the supported device platform. 

You know, many tvm users are not experienced enough as tvm authors, they may 
want to see more tutorials or reports on why choose tvm-quantize instead of 
other dl framework’s quantize. 
 
Also, more testcase on cpu-int8 will be a great help, e.g. what are the 
supported cpu device for int8-quantize, how to set proper target for different 
fashion devices (I saw many many users asked the usage about TARGET, but still 
not sure which TARGET setting can achieve best performance for related device.)

Thanks again for all tvm’s authors and contributors.





---
[Visit 
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/44)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/ce3587ddee2929f46fbc1a271dba9be91651c393b4aace340e7ec4d9c9adcac5).

Reply via email to