Hi @anijain2305 regarding the requantization, if the it is not going to put in conv op, the op may suppose to output FP32, otherwise the semantic is confusing. The requantization can convert FP32 to INT8. The multiplier/shift based reuantization approach introduced by TFLite is also adopted by Caffe2/QNNPACK.
And, maybe we can put the quantization parameters in tensor, as the *scale* and *zero point* are describing the INT8 tensor data rather than the op. The op are supposed to read these parameters and get things done. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-496892642