Some comments for @anijain2305 's 
[reply](https://github.com/dmlc/tvm/issues/2351#issuecomment-496998142) :)

> > Hi @anijain2305 regarding the requantization, if the it is not going to put 
> > in conv op, the op may suppose to output FP32, otherwise the semantic is 
> > confusing. The requantization can convert FP32 to INT8. The 
> > multiplier/shift based reuantization approach introduced by TFLite is also 
> > adopted by Caffe2/QNNPACK.
> 
> Makes sense. Does it make sense to add accumulator_dtype as one of the 
> attributes of quantized_conv2d. This will be set to int32 for TFLite, Caffe2, 
> QNNPACK. But, if some network needs accumulation in FP32, then it will 
> support that as well.

A network uses operators (or layers or anything we'd like to call it) 
regardless of the *accumulation format*. The format is part of a software 
system's mechanism. So, I guess we don't need a `accumulator_dtype` and the 
`out_dtype` is what we want. The discussion is about whether we put 
requantization inside the conv2d op.

> > And, maybe we can put the quantization parameters in tensor, as the scale 
> > and zero point are describing the INT8 tensor data rather than the op. The 
> > op are supposed to read these parameters and get things done.
> 
> Not sure about this. The good thing is the conv2d relay operator can be 
> shared across FP32 and quantized tensor types. The bad thing is compute 
> depends on the quantized tensor type now. This might require new Relay 
> optimizations, preventing us to fully use the existing infrastructure.

I was saying extending existing tensor rather than introduce new tensor type. I 
assume that this won't lead to new Relay opt :)



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-497232401

Reply via email to