Currently, this should be a common problem. Quantized model may introduce many additional operators.
There is indeed some optimizations that can be done for quantized model. Recently, I am working on some **computational graph level** optimizations, and hopefully will be upstreamed to the main branch within 1 month. For now, if you are interested, you can try to run your model in [this](https://github.com/apache/tvm/pull/15402) version and see if there is significant improvement. --- [Visit Topic](https://discuss.tvm.apache.org/t/slower-execution-times-after-8-bit-quantization/15502/3) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/d7431e08eb549d6c4a2d2fa316b8c08ea7d7df3e4a6a9d2618aecdf011a5c560).