> > > > For the `q_conv2d`, we will add two more arguments. > > > > ```python > > > > output_min=0, > > > > output_max=0 > > > > ``` > > > > > > > > > > > > These will be used for restrict the output range, which could be > > > > calculated previously. > > > > > > > > > I see what you are saying, but I am not sure if this is the right > > > approach. In my opinion, it will be better to put it out of conv. The > > > reason we have these 2 extra min/maxes is because of fused activation in > > > TFLite. It seems better to keep it separate so that both MxNet and TFLite > > > can share quantized_conv2d. In case of TFLite, when we see a fused conv, > > > we can add one more clamp operator in the sequence of ops at the end. > > > > > > No matter whether we have fused activation function, we always need > > output_min / output_max. Because we will get conv int32 result, but we will > > need uint8 result. So we must restrict int32 to uint8. If we don't have > > fused activation function, (When we have quantized model of TFLite, we > > don't have fused activation many cases), the output_min / output_max will > > be 0 / 255 to restrict int32 result. If we have relu6, output_min / > > output_max will be 0 / 6. So I think we are better put these two into conv > > argument. And we could avoid producing another clamp, just be calculated in > > conv2d requantize int32 -> uint8 process and it is nature. > > In the case the activation is not fused, the values have to clamped to 0/255 > or uint8 range, which is basically the out_dtype. So, we do not need any > extra information for the quantized_conv2d for going back to uint8/int8 other > than out_dtype. Correct? > > Now, If the activation is fused, I agree that we will have two clamps now. > One inside the quantized_conv2d (0/255), and one for the relu6 (0/6). I think > this is fine. We can also write a Relay that replaces two back-to-back > clamping with one clamp Relay operator. > > The reason I am saying this is that TFLite chooses one way to handle things, > which other frameworks might not. So, it is necessary to come up with right > abstractions first. The performance can be then be achieved by writing Relay > passes.
Yes, I agree when we don't have activation, we don't need anything. However, Another thing we should consider: How to integrate with other libraries, such as QNNPACK. QNNPACK also need output min / output max too. https://github.com/pytorch/QNNPACK/blob/master/include/qnnpack.h#L62-L63 -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-497074984