Hi eveyrone,

I would like to know whether int4/int16 quantization was possible using 
"`relay.quantize.quantize`". So far, I have gone through the documentation in

https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/quantize/quantize.py

But I have a few questions:

1) What is the difference between `nbit_weight` and `dtype_weight`?. I was 
expecting that the type of the workloads for my tasks would change by only 
changing `nbit_weight`, but I had to also use `dtype_weight = "int16"` to 
achieve that.

The above also applies to "nbit_input" and "dtype_input"

2) What are the parameters that you have to modify to have "int16" 
quantization. So far, my code is:

`with relay.quantize.qconfig(calibrate_mode='global_scale', nbit_dict = 16, 
nbit_weight = 16, dtype_input = "int16", dtype_weight = "int16", 
global_scale=8.0):`
`mod = relay.quantize.quantize(mod, params=dict_params)`

Would this be enough?

3) In the literature, you often find that quantization takes the form:

`xint = xfloat/scale + offset`

Is there any `offset` available in the `relay.quantize.qconfig` function?





---
[Visit 
Topic](https://discuss.tvm.ai/t/question-about-qconfig-options-for-quantization/6602/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/1cb08e410f048359a376586bcf8e191d99763120a7fbb28b74aa4a16aba0e5e7).

Reply via email to