I just want to point out, again, that the output_activation_min and output_activation_max are required even if there is no specified activation operation, since they provide saturation to the quantization range ... avoiding overflow error.
Also, if you fuse activation operations during training, prior to the re-quantization, then you gain the extra bit of resolution for quantization. I believe tflite has done this in all their quantized inference models in their repository. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-508824248