Some comments for @anijain2305 's [reply](https://github.com/dmlc/tvm/issues/2351#issuecomment-496998142) :)
> > Hi @anijain2305 regarding the requantization, if the it is not going to put > > in conv op, the op may suppose to output FP32, otherwise the semantic is > > confusing. The requantization can convert FP32 to INT8. The > > multiplier/shift based reuantization approach introduced by TFLite is also > > adopted by Caffe2/QNNPACK. > > Makes sense. Does it make sense to add accumulator_dtype as one of the > attributes of quantized_conv2d. This will be set to int32 for TFLite, Caffe2, > QNNPACK. But, if some network needs accumulation in FP32, then it will > support that as well. A network uses operators (or layers or anything we'd like to call it) regardless of the *accumulation format*. The format is part of a software system's mechanism. So, I guess we don't need a `accumulator_dtype` and the `out_dtype` is what we want. The discussion is about whether we put requantization inside the conv2d op. > > And, maybe we can put the quantization parameters in tensor, as the scale > > and zero point are describing the INT8 tensor data rather than the op. The > > op are supposed to read these parameters and get things done. > > Not sure about this. The good thing is the conv2d relay operator can be > shared across FP32 and quantized tensor types. The bad thing is compute > depends on the quantized tensor type now. This might require new Relay > optimizations, preventing us to fully use the existing infrastructure. I was saying extending existing tensor rather than introduce new tensor type. I assume that this won't lead to new Relay opt :) -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-497232401