Currently, this should be a common problem.
Quantized model may introduce many additional operators.

There is indeed some optimizations that can be done for quantized model.
Recently, I am working on some **computational graph level** optimizations, and 
hopefully will be upstreamed to the main branch within 1 month.
For now, if you are interested, you can try to run your model in 
[this](https://github.com/apache/tvm/pull/15402) version and see if there is 
significant improvement.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/slower-execution-times-after-8-bit-quantization/15502/3)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/d7431e08eb549d6c4a2d2fa316b8c08ea7d7df3e4a6a9d2618aecdf011a5c560).

Reply via email to