> To complete the picture, suppose the quantized framework graph is (fw stands > for framework) > > `fw.quantize -> fw.qconv2d -> fw.qrelu -> fw.dequantize`
If you do the qconv2d and qrelu operations sequentially, using their analogous fp operations, the output from qrelu will have the (potentially worse) resolution of the initial qconv2d. So, you need to be careful if you are trying to use the fully sequential, separate operation results as a reference. I can see that you might want the graph to represent all the operations prior to optimizing the implementation. I just want to point out that the qrelu implementation can avoid the lowered resolution and can be completely cost free by revising the downscale multiplier and zero point of a preceding quantized output operation (qconv2d in this case). It is cost free because the clipping values are required in any case to do the quantized range saturation. The operation of revising the downscale multiplier of a previous graph operation is also useful to achieve zero cost replacement of the scale normalization operations in the quantized concat operations in the inception models. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-508965621