Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

Animesh Jain Wed, 29 May 2019 14:41:56 -0700

> > > For the `q_conv2d`, we will add two more arguments.
> > > ```python
> > >   output_min=0, 
> > >   output_max=0
> > > ```
> > > 
> > > 
> > > These will be used for restrict the output range, which could be 
> > > calculated previously.
> > 
> > 
> > I see what you are saying, but I am not sure if this is the right approach. 
> > In my opinion, it will be better to put it out of conv. The reason we have 
> > these 2 extra min/maxes is because of fused activation in TFLite. It seems 
> > better to keep it separate so that both MxNet and TFLite can share 
> > quantized_conv2d. In case of TFLite, when we see a fused conv, we can add 
> > one more clamp operator in the sequence of ops at the end.
> 
> No matter whether we have fused activation function, we always need 
> output_min / output_max. Because we will get conv int32 result, but we will 
> need uint8 result. So we must restrict int32 to uint8. If we don't have fused 
> activation function, (When we have quantized model of TFLite, we don't have 
> fused activation many cases), the output_min / output_max will be 0 / 255 to 
> restrict int32 result. If we have relu6, output_min / output_max will be 0 / 
> 6. So I think we are better put these two into conv argument. And we could 
> avoid producing another clamp, just be calculated in conv2d requantize int32 
> -> uint8 process and it is nature.


In the case the activation is not fused, the values have to clamped to 0/255 or 
uint8 range, which is basically the out_dtype. So, we do not need any extra 
information for the quantized_conv2d for going back to uint8/int8 other than 
out_dtype. Correct? 

Now, If the activation is fused, I agree that we will have two clamps now. One 
inside the quantized_conv2d (0/255), and one  for the relu6 (0/6). I think this 
is fine. We can also write a Relay that replaces two back-to-back clamping with 
one clamp Relay operator.

The reason I am saying this is that TFLite chooses one way to handle things, 
which other frameworks might not. So, it is necessary to come up with right 
abstractions first. The performance can be then be achieved by writing Relay 
passes.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-497057012

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

Reply via email to