Hi eveyrone,
I would like to know whether int4/int16 quantization was possible using "`relay.quantize.quantize`". So far, I have gone through the documentation in https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/quantize/quantize.py But I have a few questions: 1) What is the difference between `nbit_weight` and `dtype_weight`?. I was expecting that the type of the workloads for my tasks would change by only changing `nbit_weight`, but I had to also use `dtype_weight = "int16"` to achieve that. The above also applies to "nbit_input" and "dtype_input" 2) What are the parameters that you have to modify to have "int16" quantization. So far, my code is: `with relay.quantize.qconfig(calibrate_mode='global_scale', nbit_dict = 16, nbit_weight = 16, dtype_input = "int16", dtype_weight = "int16", global_scale=8.0):` `mod = relay.quantize.quantize(mod, params=dict_params)` Would this be enough? 3) In the literature, you often find that quantization takes the form: `xint = xfloat/scale + offset` Is there any `offset` available in the `relay.quantize.qconfig` function? --- [Visit Topic](https://discuss.tvm.ai/t/question-about-qconfig-options-for-quantization/6602/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/1cb08e410f048359a376586bcf8e191d99763120a7fbb28b74aa4a16aba0e5e7).