Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Zhao Wu
> > Not true. When there is activation, the range is not always 0 ~ 255. For > > example RELU, > > I believe tflite extends the quantization range so it always includes 0, as > done in the gemmlowp quantization example below. I have dumped my min and max > saturation input values from the six q

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread ds-jnorwood
> Not true. When there is activation, the range is not always 0 ~ 255. For > example RELU, I believe tflite extends the quantization range so it always includes 0, as done in the gemmlowp quantization example below. I have dumped my min and max saturation input values from the six quantized tfl

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread ds-jnorwood
In the tflite quantized Mobilenet v2, from the repository, the first conv operation has a non-zero offset ... there is no activation. The offset is 128. So either provide a conv which uses signed int8 and 0 offset, or do what tflite does and handle it as quantized uint8 convolution with 128 o

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Zhao Wu
> In that case min and max values passed into the quantized conv are always 0 > and 255. Not true. When there is activation, the range is not always 0 ~ 255. For example RELU, ```cpp auto quantize = [scale, zero_point](float f) { return zero_point + static_cast(TfLiteRound(f / scale));

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread ds-jnorwood
> If no activation, we will clamp it to 0 / 127. In the tflite quantized conv implementation ( I posted an excerpt from their code previously) the offset is added in prior to the clamping. The tflite quantized models in their repository used uint8 asymmetric quantization with non-zero offset

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Zhao Wu
> `https://arxiv.org/pdf/1803.08607.pdf` Qualcomm's Way? Let us see the Google's TFLite model: ![image](https://user-images.githubusercontent.com/7287321/59577624-0f541000-90f7-11e9-9044-2153d6f9ccda.png) We have the quantized model doesn't remove RELU6 in dw conv / conv. I think we should focus

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread ds-jnorwood
> I guess the saturation is exactly what activations (ReLU family) mean, > semantically. :) In the case of the tflite quantized models I've looked at, the batch normalization and relu6 operations in training are fused into the conv operations used during inference. You probably need to fuse

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Zhao Wu
> > Although the quantized conv result is held in uint8, it could be static > > casted to signed int8, or even fewer than 8 bit quantization. That would > > require both min and max saturations, as in the reference tflite quantized > > conv implementation > > Ah, I see. That finally makes sense

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread 黎明灰烬
> During inference, the min and max saturation values are just handling > saturation of values seen outside the range expected from the training... I guess the saturation is exactly what activations (ReLU family) mean, semantically. :) -- You are receiving this because you are subscribed to

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread 黎明灰烬
> > > It appears to me this would let them simulate smaller than 8 bit > > > quantizations. > > > > > > If _simulating 8 smaller bit_ is the case, 8 bit should be able to hold > > activation min/max value. > > 8 bits could hold. But what the value output_min / output_max is ? I think > @jnorw

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread ds-jnorwood
yes, right. The scaling constant computed during training is based on the range of values seen after fused in activations (at least that is true for the tflite quantized models I've looked at). That includes being after the relu6 positive clipping also. During inference, the min and max sat

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread 黎明灰烬
> So, this is not about activation. Of course it comes from activation, and is related to zero point and scale. Maybe you can read the whole implementation rather than read secondhand message. For this min/max activation: 1. They are even named with activation when used in computing kernel: http

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Animesh Jain
> Although the quantized conv result is held in uint8, it could be static > casted to signed int8, or even fewer than 8 bit quantization. That would > require both min and max saturations, as in the reference tflite quantized > conv implementation Ah, I see. That finally makes sense. So, this i

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread ds-jnorwood
The min and max are not conditional on existence of activation operation in the original model. They are there to saturate the downscaled and offset adjusted 32 bit signed int accumulator to the min and max value of the uint8 quantized bit range. Although the quantized conv result is held in

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Animesh Jain
> I think it is ok. If we do this way, we should insert one clamp if we have > activation. > Like our tflite frontend Yes, I agree with that. That's exactly what I was thinking. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHu