I'd like to make sure the end goal of this framework is to create a fully 
quantized graph, ie with all operators in affine space.

Unlike the usual transformation contraint in TVM that graph rewrite doesn't 
change outcome, for quantization, it obviously does. Statistics must be 
available to help answer how much. 

>From a BYOC point of view, some group of operators may be replaced by 
>efficient hardware equivalent. For example, conv-add-relu. Also, math 
>functions may be replaced by LUT. 

The transformed graph is a simulated quantized graph that allows the user or 
the quantization framework to always simulate output and handle quantization 
error. I don't think we need to provide all combinations but hooks should be in 
place to allow such custom, user defined, handling.

Finally, the proposal may be missing definition of accumulators in affine 
space. While weights, inputs (constant or dynamic) and outputs will be in 
affine space eg int8 dtype, it is important to be able to specify on which 
dtype intermediate math operations will be, for example int32. If we allow any 
kind of dtype, then the simulated quantized graph should be able to answer how 
many bits do I need before saturation. Again, I view such answers as part of 
statistics the user can analyze. At TIR level, such accumulators may lead to 
efficient, hardware dependent, transformations.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-quantization-a-new-quantization-framework-in-tvm-initial-rfc-1-4/9775/19)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/0d4f8f42ddb8ddcf3ee0e93b3a5602975f9c62e5f69d2872f8352fe1d5b73e29).

Reply via email to