Ok, lets try to finalize the high-level design points. Lets first discuss the # Namespace for the tflite quantize style dialect
### Requirements * This should support both symmetric and asymmetric. * These ops should never go through codegen. They will be lowered to low-level Relay ops (like existing conv, round etc) using FForwardRewrite or similar kind of Relay infrastructure. ### Proposal How about using `relay.op._quantization` as the namespace? So, the operations can be `relay.op._quantization.conv2d` or `relay.op._quantization.quantize`. ### Pros * Separation of concerns - Restricts the number of ops for which TVM compute has to be written. * Good readability/debugging - Framework parsing will be easier compared to directly lowering to low-level Relay ops. Also, one can look at the quantized annotation ops and understand the quantization flow. ### Cons * Getting the best performance might require some new Relay passes. It might require working on a peephole optimizer or some complicated fusion. (Symmetric quantization might already work very well with existing Relay infrastructure. Asymmetric most probably will need more efforts.) Let me know your thoughts on this. As we achieve consensus, I can start prototyping these operators with stubbing implementation. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-498749597