According to [this tutorial](https://tvm.apache.org/docs/tutorials/frontend/deploy_prequantized.html?highlight=calibration), if we aim at converting models to 8 bit, we can convert framework-prequantized Model (with my quantization information) to tvm. However, framework like PyTorch does not support quantization bit lower than 8.
Another way might be converting float model to tvm first and use `quantize_relay_module` to quantize the float relay model. However, this way use tvm's quantization algorithms (e.g. KL) to calculate quantization scales and zero points which may lead to larger accuracy drop. So are there any ways to send quantization scales and zero points when quantize models to bits lower than 8? --- [Visit Topic](https://discuss.tvm.apache.org/t/are-there-any-ways-to-send-quantization-information-to-tvm-when-quantizing-model-lower-than-8-bit/9150/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/844ab9b4cd2d60d499439f8d8c592b8aff5e653b1111017150939189af2d699f).