The tflite quantized convolution reference implementation passes in both limits 
as int32 values.  It appears to me this would let them simulate smaller than 8 
bit quantizations, if that is something you want to support.

this is from  ` tensorflow/lite/kernels/internal/reference/conv.h `


`
 acc = MultiplyByQuantizedMultiplier(acc, output_multiplier,
                                              output_shift);
          acc += output_offset;
          acc = std::max(acc, output_activation_min);
          acc = std::min(acc, output_activation_max);
          output_data[Offset(output_shape, batch, out_y, out_x, out_channel)] =
static_cast<uint8>(acc);
`

 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502401254

Reply via email to