I just want to point out, again, that the output_activation_min and  
output_activation_max are required even if there is no specified activation 
operation, since they provide saturation to the quantization range ... avoiding 
overflow error. 

Also, if you fuse  activation operations during training, prior to the 
re-quantization, then you gain the extra bit of resolution for quantization.  I 
believe tflite has done this in all their quantized inference models in their 
repository.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-508824248

Reply via email to