> Although the quantized conv result is held in uint8, it could be static > casted to signed int8, or even fewer than 8 bit quantization. That would > require both min and max saturations, as in the reference tflite quantized > conv implementation
Ah, I see. That finally makes sense. So, this is not about activation. This is about what representation one is using for storing the floating point values. For example, if it is 7-bits, we will need the output min/max saturations. Cool, I will add them into the API and add corresponding documentation. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-502492887