@anijain2305
thanks a lot. I thought the tvm relay quantize is the same as tvm model converted from pre-quantized. I also test tvm-int8 model from pytorch qat model, the speed is the same as tvm-relay-quantize-int8 model. I really have no idea how to get 1.3x -1.5x speedup no matter pre-quantize-int8-model or tvm-relay-quantize-int8-model. I m eager for your kind help to reproduce the speedup on android arm device. Nice to see your effort for quantize tutorial! I also recommend that you give more tutorials on how to get desired int8 speedup compared fp32 on the supported device platform. You know, many tvm users are not experienced enough as tvm authors, they may want to see more tutorials or reports on why choose tvm-quantize instead of other dl framework’s quantize. Also, more testcase on cpu-int8 will be a great help, e.g. what are the supported cpu device for int8-quantize, how to set proper target for different fashion devices (I saw many many users asked the usage about TARGET, but still not sure which TARGET setting can achieve best performance for related device.) Thanks again for all tvm’s authors and contributors. --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/44) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/ce3587ddee2929f46fbc1a271dba9be91651c393b4aace340e7ec4d9c9adcac5).