The tflite quantized convolution reference implementation passes in both limits as int32 values. It appears to me this would let them simulate smaller than 8 bit quantizations, if that is something you want to support.
this is from ` tensorflow/lite/kernels/internal/reference/conv.h ` ` acc = MultiplyByQuantizedMultiplier(acc, output_multiplier, output_shift); acc += output_offset; acc = std::max(acc, output_activation_min); acc = std::min(acc, output_activation_max); output_data[Offset(output_shape, batch, out_y, out_x, out_channel)] = static_cast<uint8>(acc); ` -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-502401254