> Although the quantized conv result is held in uint8, it could be static 
> casted to signed int8, or even fewer than 8 bit quantization. That would 
> require both min and max saturations, as in the reference tflite quantized 
> conv implementation

Ah, I see. That finally makes sense.
So, this is not about activation. This is about what representation one is 
using for storing the floating point values. For example, if it is 7-bits, we 
will need the output min/max saturations. Cool, I will add them into the API 
and add corresponding documentation.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502492887

Reply via email to