Dear Community: I load my onnx model using *tvm.relay.frontend.from_onnx* .Then I convert it to int8 using the following code:
> with relay.quantize.qconfig(): > > mod = relay.quantize.quantize(mod, params) And i got the execution time is 25ms,which the fp32 is 40ms.But if i quantize the model with the following code: > mod = relay.quantize.quantize(mod, params) The execution time change to 40ms,slower than quantize-method with the code `with relay.quantize.qconfig()`. Then i try to load pre_quantize model from ort,the performance is same as the quant-model without qconfig. So there are my questions: 1.Why the performances are different?The two ways should both using the default quantize.config. 2.What did the code `with relay.quantize.qconfig()` exactly do?If i want to load pre_quantize model,how can i get the better performance? --- [Visit Topic](https://discuss.tvm.apache.org/t/same-tvm-quantize-method-result-in-different-performances/15218/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/0c0de75bdecddcff0e3bb7cc05b5c4bc92d41a11b763ba26a5d3ad08bf321c00).