Ok, lets try to finalize the high-level design points. Lets first discuss the 
# Namespace for the tflite quantize style dialect

### Requirements
* This should support both symmetric and asymmetric.
* These ops should never go through codegen. They will be lowered to low-level 
Relay ops (like existing conv, round etc) using FForwardRewrite or similar kind 
of Relay infrastructure.

### Proposal

How about using `relay.op._quantization` as the namespace? So, the operations 
can be `relay.op._quantization.conv2d` or `relay.op._quantization.quantize`.

### Pros

* Separation of concerns - Restricts the number of ops for which TVM compute 
has to be written.
* Good readability/debugging - Framework parsing will be easier compared to 
directly lowering to low-level Relay ops. Also, one can look at the quantized 
annotation ops and understand the quantization flow.

### Cons
* Getting the best performance might require some new Relay passes. It might 
require working on a peephole optimizer or some complicated fusion. (Symmetric 
quantization might already work very well with existing Relay infrastructure. 
Asymmetric most probably will need more efforts.)

Let me know your thoughts on this. As we achieve consensus, I can start 
prototyping these operators with stubbing implementation.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-498749597

Reply via email to