> Hi @anijain2305 regarding the requantization, if the it is not going to put > in conv op, the op may suppose to output FP32, otherwise the semantic is > confusing. The requantization can convert FP32 to INT8. The multiplier/shift > based reuantization approach introduced by TFLite is also adopted by > Caffe2/QNNPACK.
Makes sense. Does it make sense to add accumulator_dtype as one of the attributes of quantized_conv2d. This will be set to int32 for TFLite, Caffe2, QNNPACK. But, if some network needs accumulation in FP32, then it will support that as well. > And, maybe we can put the quantization parameters in tensor, as the scale and > zero point are describing the INT8 tensor data rather than the op. The op are > supposed to read these parameters and get things done. Not sure about this. The good thing is the conv2d relay operator can be shared across FP32 and quantized tensor types. The bad thing is compute depends on the quantized tensor type now. This might require new Relay optimizations, preventing us to fully use the existing infrastructure. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-496998142