Yes, you can also view the domain conversion minimization as an optimization 
pass here. The resulting graph is to some extent equivalent semantically 
equivalent to the original one that does the conversion to f32 and back and 
forth.  The idea is we can be smarter when lowering qnn ops into the relay 
sequence.

For example, when lowering the ```qconv2d -> qrelu ``` sequence, we don't have 
to convert the result of qconv2d to f32 and then back to i8, they can be 
represented directly in the i8 domain without having to get back to f32.  The 
mechanism in the current realize might help in this case.

There are also two separation  steps in current tvm's quantizer. We always 
first make the choice(this step was done by other frameworks), and then decide 
how to best translate to low-level operator(realize stage in quantization). The 
realize stage in current quantization part would serve as a good reference.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-508962111

Reply via email to