According to [this 
tutorial](https://tvm.apache.org/docs/tutorials/frontend/deploy_prequantized.html?highlight=calibration),
 if we aim at converting models to 8 bit, we can convert framework-prequantized 
Model (with my quantization information) to tvm. However, framework like 
PyTorch does not support quantization bit lower than 8.

Another way might be converting float model to tvm first and use 
`quantize_relay_module` to quantize the float relay model. However, this way 
use tvm's quantization algorithms (e.g. KL) to calculate quantization scales 
and zero points which may lead to larger accuracy drop.

So are there any ways to send quantization scales and zero points when quantize 
models to bits lower than 8?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/are-there-any-ways-to-send-quantization-information-to-tvm-when-quantizing-model-lower-than-8-bit/9150/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/844ab9b4cd2d60d499439f8d8c592b8aff5e653b1111017150939189af2d699f).

Reply via email to