hello, I alos meet a onnx auto-tune build problem, can you do relay.build after
auto-tune finished.
```
with relay.build_config(opt_level=3):
graph, lib, params = relay.build(
mod, target=target, params=params)
```
More info about my problem.
[ [Auto-tune finished, b
I guess the error maybe caused by model load difference betweenthe onnx and
mxnet model, focusing on ` relay.Function` and `tvm.IRModule.from_expr`,
**1. I am not sure what do they used for, and what should I write for onnx
model?**
```
def customed_network_from_onnx(model_path, input_shapes,
When I auto-tune my own onnx model, it finished:
```
[Task 20/22] Current/Best:3.86/ 14.62 GFLOPS | Progress: (5/5) | 4.90 s
Done.
[Task 21/22] Current/Best:7.47/ 12.78 GFLOPS | Progress: (5/5) | 2.42 s
Done.
[Task 22/22] Current/Best:2.07/ 2.07 GFLOPS | Progress: (5/5) | 2.5
@anijain2305 @masahi
[ [topi] add ARM v8.2 udot (uint8) support
#3978](https://github.com/apache/incubator-tvm/pull/3978)
as this commit said, arm platform support udot(uint8), can I reckon that arm
can achieve int8-speedup for udot(uint8) support, then what is the right open
method?
-
@anijain2305
thanks a lot.
I thought the tvm relay quantize is the same as tvm model converted from
pre-quantized.
I also test tvm-int8 model from pytorch qat model, the speed is the same as
tvm-relay-quantize-int8 model.
I really have no idea how to get 1.3x -1.5x speedup no matter
pre-q
[quote="anijain2305, post:27, topic:6256, full:true"]
For rasp3 and rasp4, we saw 1.3x - 1.5x performance speedup going from FP32 to
Int8.
The link comparing QNNPACK and TVM is not upstream'd yet. If I understand
correctly, it will be sometime before the authors of that work will be able to
m
I also have the same question for a long time.
---
[Visit Topic](https://discuss.tvm.ai/t/optimization-0-3/6310/2) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/ac67b543c99a
[quote="anijain2305, post:39, topic:6256, full:true"]
QNNPACK is for ARM, whereas VNNI instructions are for Intel. So, not exactly
that reason. But, the underlying statement might still be the case, that we
dont have good TVM schedules.
Regarding schedules to get same speedup as QNNPACK, we c
[quote="janimesh, post:5, topic:3920"]
MobileNet models have slowdown because they use Depthwise convolution that has
not been configured to use VNNI instructions.
[/quote]
This might be the reason why tvm is slower than qnnpack. see
[link](https://discuss.tvm.ai/t/quantization-story/3920/5?u=
[quote="anijain2305, post:36, topic:6256, full:true"]
Yes, that seems plausible. Please note that one might also make FP32 schedule
better by working on low-level optimizations :) So, it is relative.
[/quote]
Can I define a new schedule to optimize performance to get the same speed as
QNNPACK
[quote="anijain2305, post:34, topic:6256, full:true"]
Yeah, the work by AliOS is not available yet. They worked a lot on very
low-level optimizations. Over time, this work will hopefully be upstreamed. For
now, on master, QNNPACK is faster.
[/quote]
Your also said **For rasp3 and rasp4, we saw
[quote="kindlehe, post:19, topic:6256, full:true"]
@anijain2305
How much speedup does FP32 compared INT8 at rasp4?1.5×?
I saw some speedup conclusion
[here](https://github.com/tvmai/meetup-slides/tree/master/tvm-meetup-shanghai-Nov-16-2019)
saying that tvm is about 1.3×(=2.08/1.60)at mobilene
[quote="anijain2305, post:31, topic:6256, full:true"]
Yes, thats the selling point of TVM.
TVM community works together on these TVM schedules. As we get more people
interested in quantization, we can add more TVM schedules, for e.g., avx2
machine you are talking about. We dont want to fully r
[quote="anijain2305, post:27, topic:6256, full:true"]
For rasp3 and rasp4, we saw 1.3x - 1.5x performance speedup going from FP32 to
Int8.
The link comparing QNNPACK and TVM is not upstream'd yet. If I understand
correctly, it will be sometime before the authors of that work will be able to
m
[quote="masahi, post:28, topic:6256, full:true"]
[quote="kindlehe, post:26, topic:6256"]
Will tvm consider integrating FBGEMM to get the same heavy lifting in the
future as pytorch has done to support the same high speedup in avx2 device?
[/quote]
No. We should rather improve our avx2 schedule
[quote="masahi, post:25, topic:6256"]
https://github.com/pytorch/FBGEMM
[/quote]
Will tvm consider integrating FBGEMM to get the same heavy lifting in the
future as pytorch has done to support the same high speedup in avx2 device?
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-s
@masahi
I wonder why pytorch can run so fast?
Is it because pytorch use int8 in the same macbook pro, or other speed-up
technique?
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/24)
to respond.
You are receiving this because you enab
The speed is tested on 2 cores for tvm and 1 core for torch,
so tvm@mobilenet-v3 is faster thant torch@mobilenet-v3
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/22)
to respond.
You are receiving this because you enabled mailing list
@masahi @anijain2305
I am not very sure whether INT8 is used in `perf_bench`, due to I see these log:
```
autotvm:Cannot find config for target=llvm -mcpu=core-avx2,
workload=('dense_nopack.x86', ('TENSOR', (1, 1280), 'int16'), ('TENSOR', (1000,
1280), 'int16'), None, 'int32'). A fallback con
@masahi I set `os.environ["TVM_NUM_THREADS"] = str(2)`, but it does not help
to the speed.
I also watch the cpu% of `tvm_model.module.time_evaluator` and `pt_model(inp)`
by `top` command,
the cpu%<=100%, it maybe means that both tvm and torch only use one thread to
do inference.
Here is the
How much speedup does FP32 compared INT8 at rasp4?1.5×?
I saw some speedup conclusion
[here](https://github.com/tvmai/meetup-slides/tree/master/tvm-meetup-shanghai-Nov-16-2019)
saying that tvm is about 1.3×(=2.08/1.60)at mobilenet-v2@rasp 3b+AARCH64 than
QNNPACK.
They reported apparent speed
thanks very much!
I will check TVM_NUM_THREADS tomorrow morning.
Have you ever compared the tvm speed of FP32 and INT8 at android arm cpu,do you
think tvm@INT8 will make better speed than tvm@FP32 at android device?
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison
Here is the spped comparison of quantized pytorch model and converted tvm model
at macbook pro.
I have no idea why tvm is faster than torch for mobilenet-v3, but slower for
resnet-18, resnet-50 and mobilenet-v2?

---
[Visit
Topic
This problem is solved by rebuild tvm in a correct way.
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/12)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https
I revise the input name in
[imagenet_test.py](https://github.com/Edgecortix-Inc/pytorch_quantization/blob/master/tvm_qnn_evaluation/imagenet_test.py)
as following:

But get the following error while execute for resnet18 model:

@torch-nightly(v1.6) @macbook pro, and get the 2-4x speed-up as the
[static_quantization_tutorial](https://pytorch.org/tutorials/advanced
Yes,thanks again for your reply.
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/9)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/
Thanks for your detailed reply very much!
I will try your suggestion to try these scripts later.
However, there are still some question for me:
1. Should I use `Post-training static quantization` or `Quantization-aware
training` for my own model as
[static_quantization_tutorial](https://pytor
[ [Torch, QNN] Add support for quantized models via QNN
#4977](https://github.com/apache/incubator-tvm/pull/4977) gives performance of
quantized Torch models and converted tvm quantized model,but did not give the
speed comparison betweem.
Where could I get more speed compason of two kinds of q
29 matches
Mail list logo