I think we are getting confused because of the overloaded term quantization. To be precise, maybe we can stick to certain terms
* *QNN Dialect* - Framework (like TF/PyTorch/MXNet) performs quantization. Relay parser reads this pre-quantized model and creates a QNN-dialect graph. QNN ops are like a wrapper, that are lowered to a sequence of existing Relay operators. * *Relay Automatic Quantization* - Takes FP32 Relay model, quantizes, produces a Relay graph with integer datatypes. * *Bring Your Own Codegen Quantizer* - In this case, the hardware vendors have their own quantization flow because the HW accelerator can have certain restrictions that are not suitably reflected in Relay Automatic Quantization or Framework quantization. **This RFC is for this category**. These three options differ at which point quantization is happening. In QNN, it happens in one extreme - frameworks. In BYOCQ, it happens in the other extreme - codegen. Relay Automatic quantization is in between. This RFC is for BYOC quantizer. In this case, the Relay graph that goes to codegen is FP32. Actually, Relay does not even know that codegen is going to perform quantization. However, external codegen needs input/output tensor values for each subgraph to perform calibration later. This RFC discusses the API and flow to do that. @weberlo Hopefully this gives some context. You are right that we should think what is missing in Relay Automatic Quantization to enable more hardware-aware quantization. At the same time, there are hardware vendors that have their own mature codegen toolchain and wants to reuse it as much as possible. --- [Visit Topic](https://discuss.tvm.ai/t/rfc-byoc-data-calibration-flow/7099/12) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/de729323bbb8824fffa88eeb25359c6c5360139516c65e1ae003a01300f92b7f).