Dear Community:
I load my onnx model using *tvm.relay.frontend.from_onnx* .Then I convert it to 
int8 using the following code:

> with relay.quantize.qconfig():
>
>     mod = relay.quantize.quantize(mod, params)

And i got the execution time is 25ms,which the fp32 is 40ms.But if i quantize 
the model with the following code:
> mod = relay.quantize.quantize(mod, params)

The execution time change to 40ms,slower than quantize-method with the code 
`with relay.quantize.qconfig()`.
Then i try to load pre_quantize model from ort,the performance is same as the 
quant-model without qconfig.

So there are my questions:

1.Why the performances are different?The two ways should both using the default 
quantize.config.
2.What did the code  `with relay.quantize.qconfig()` exactly do?If i want to 
load pre_quantize model,how can i get the better performance?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/same-tvm-quantize-method-result-in-different-performances/15218/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/0c0de75bdecddcff0e3bb7cc05b5c4bc92d41a11b763ba26a5d3ad08bf321c00).

Reply via email to