The tflite quantized convolution reference implementation passes in both limits
as int32 values. It appears to me this would let them simulate smaller than 8
bit quantizations, if that is something you want to support.
this is from ` tensorflow/lite/kernels/internal/reference/conv.h `
`
acc = MultiplyByQuantizedMultiplier(acc, output_multiplier,
output_shift);
acc += output_offset;
acc = std::max(acc, output_activation_min);
acc = std::min(acc, output_activation_max);
output_data[Offset(output_shape, batch, out_y, out_x, out_channel)] =
static_cast<uint8>(acc);
`
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502401254