Yes, you can also view the domain conversion minimization as an optimization pass here. The resulting graph is to some extent equivalent semantically equivalent to the original one that does the conversion to f32 and back and forth. The idea is we can be smarter when lowering qnn ops into the relay sequence.
For example, when lowering the ```qconv2d -> qrelu ``` sequence, we don't have to convert the result of qconv2d to f32 and then back to i8, they can be represented directly in the i8 domain without having to get back to f32. The mechanism in the current realize might help in this case. There are also two separation steps in current tvm's quantizer. We always first make the choice(this step was done by other frameworks), and then decide how to best translate to low-level operator(realize stage in quantization). The realize stage in current quantization part would serve as a good reference. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-508962111