[TVM Discuss] [Questions] [SOLVED] Auto-tuning CUDA: Poor Performance

2020-04-17 Thread kindlehe via TVM Discuss
hello, I alos meet a onnx auto-tune build problem, can you do relay.build after auto-tune finished. ``` with relay.build_config(opt_level=3): graph, lib, params = relay.build( mod, target=target, params=params) ``` More info about my problem. [ [Auto-tune finished, b

[TVM Discuss] [Questions] Auto-tune finished, but Build error occurs for my own onnx model

2020-04-17 Thread kindlehe via TVM Discuss
I guess the error maybe caused by model load difference betweenthe onnx and mxnet model, focusing on ` relay.Function` and `tvm.IRModule.from_expr`, **1. I am not sure what do they used for, and what should I write for onnx model?** ``` def customed_network_from_onnx(model_path, input_shapes,

[TVM Discuss] [Questions] Auto-tune finished, but Build error occurs for my own onnx model

2020-04-17 Thread kindlehe via TVM Discuss
When I auto-tune my own onnx model, it finished: ``` [Task 20/22] Current/Best:3.86/ 14.62 GFLOPS | Progress: (5/5) | 4.90 s Done. [Task 21/22] Current/Best:7.47/ 12.78 GFLOPS | Progress: (5/5) | 2.42 s Done. [Task 22/22] Current/Best:2.07/ 2.07 GFLOPS | Progress: (5/5) | 2.5

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-15 Thread kindlehe via TVM Discuss
@anijain2305 @masahi [ [topi] add ARM v8.2 udot (uint8) support #3978](https://github.com/apache/incubator-tvm/pull/3978) as this commit said, arm platform support udot(uint8), can I reckon that arm can achieve int8-speedup for udot(uint8) support, then what is the right open method? -

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-15 Thread kindlehe via TVM Discuss
@anijain2305 thanks a lot. I thought the tvm relay quantize is the same as tvm model converted from pre-quantized. I also test tvm-int8 model from pytorch qat model, the speed is the same as tvm-relay-quantize-int8 model. I really have no idea how to get 1.3x -1.5x speedup no matter pre-q

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-15 Thread kindlehe via TVM Discuss
[quote="anijain2305, post:27, topic:6256, full:true"] For rasp3 and rasp4, we saw 1.3x - 1.5x performance speedup going from FP32 to Int8. The link comparing QNNPACK and TVM is not upstream'd yet. If I understand correctly, it will be sometime before the authors of that work will be able to m

[TVM Discuss] [Questions] Optimization 0-3?

2020-04-10 Thread kindlehe via TVM Discuss
I also have the same question for a long time. --- [Visit Topic](https://discuss.tvm.ai/t/optimization-0-3/6310/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/ac67b543c99a

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-10 Thread kindlehe via TVM Discuss
[quote="anijain2305, post:39, topic:6256, full:true"] QNNPACK is for ARM, whereas VNNI instructions are for Intel. So, not exactly that reason. But, the underlying statement might still be the case, that we dont have good TVM schedules. Regarding schedules to get same speedup as QNNPACK, we c

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-10 Thread kindlehe via TVM Discuss
[quote="janimesh, post:5, topic:3920"] MobileNet models have slowdown because they use Depthwise convolution that has not been configured to use VNNI instructions. [/quote] This might be the reason why tvm is slower than qnnpack. see [link](https://discuss.tvm.ai/t/quantization-story/3920/5?u=

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
[quote="anijain2305, post:36, topic:6256, full:true"] Yes, that seems plausible. Please note that one might also make FP32 schedule better by working on low-level optimizations :) So, it is relative. [/quote] Can I define a new schedule to optimize performance to get the same speed as QNNPACK

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
[quote="anijain2305, post:34, topic:6256, full:true"] Yeah, the work by AliOS is not available yet. They worked a lot on very low-level optimizations. Over time, this work will hopefully be upstreamed. For now, on master, QNNPACK is faster. [/quote] Your also said **For rasp3 and rasp4, we saw

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
[quote="kindlehe, post:19, topic:6256, full:true"] @anijain2305 How much speedup does FP32 compared INT8 at rasp4?1.5×? I saw some speedup conclusion [here](https://github.com/tvmai/meetup-slides/tree/master/tvm-meetup-shanghai-Nov-16-2019) saying that tvm is about 1.3×(=2.08/1.60)at mobilene

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
[quote="anijain2305, post:31, topic:6256, full:true"] Yes, thats the selling point of TVM. TVM community works together on these TVM schedules. As we get more people interested in quantization, we can add more TVM schedules, for e.g., avx2 machine you are talking about. We dont want to fully r

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
[quote="anijain2305, post:27, topic:6256, full:true"] For rasp3 and rasp4, we saw 1.3x - 1.5x performance speedup going from FP32 to Int8. The link comparing QNNPACK and TVM is not upstream'd yet. If I understand correctly, it will be sometime before the authors of that work will be able to m

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
[quote="masahi, post:28, topic:6256, full:true"] [quote="kindlehe, post:26, topic:6256"] Will tvm consider integrating FBGEMM to get the same heavy lifting in the future as pytorch has done to support the same high speedup in avx2 device? [/quote] No. We should rather improve our avx2 schedule

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
[quote="masahi, post:25, topic:6256"] https://github.com/pytorch/FBGEMM [/quote] Will tvm consider integrating FBGEMM to get the same heavy lifting in the future as pytorch has done to support the same high speedup in avx2 device? --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-s

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
@masahi I wonder why pytorch can run so fast? Is it because pytorch use int8 in the same macbook pro, or other speed-up technique? --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/24) to respond. You are receiving this because you enab

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
The speed is tested on 2 cores for tvm and 1 core for torch, so tvm@mobilenet-v3 is faster thant torch@mobilenet-v3 --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/22) to respond. You are receiving this because you enabled mailing list

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
@masahi @anijain2305 I am not very sure whether INT8 is used in `perf_bench`, due to I see these log: ``` autotvm:Cannot find config for target=llvm -mcpu=core-avx2, workload=('dense_nopack.x86', ('TENSOR', (1, 1280), 'int16'), ('TENSOR', (1000, 1280), 'int16'), None, 'int32'). A fallback con

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
@masahi I set `os.environ["TVM_NUM_THREADS"] = str(2)`, but it does not help to the speed. I also watch the cpu% of `tvm_model.module.time_evaluator` and `pt_model(inp)` by `top` command, the cpu%<=100%, it maybe means that both tvm and torch only use one thread to do inference. Here is the

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
How much speedup does FP32 compared INT8 at rasp4?1.5×? I saw some speedup conclusion [here](https://github.com/tvmai/meetup-slides/tree/master/tvm-meetup-shanghai-Nov-16-2019) saying that tvm is about 1.3×(=2.08/1.60)at mobilenet-v2@rasp 3b+AARCH64 than QNNPACK. They reported apparent speed

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
thanks very much! I will check TVM_NUM_THREADS tomorrow morning. Have you ever compared the tvm speed of FP32 and INT8 at android arm cpu,do you think tvm@INT8 will make better speed than tvm@FP32 at android device? --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
Here is the spped comparison of quantized pytorch model and converted tvm model at macbook pro. I have no idea why tvm is faster than torch for mobilenet-v3, but slower for resnet-18, resnet-50 and mobilenet-v2? ![image|690x396](upload://2ZCtF54A2wBVxKC0KDZZ23jyriT.png) --- [Visit Topic

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread kindlehe via TVM Discuss
This problem is solved by rebuild tvm in a correct way. --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/12) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-08 Thread kindlehe via TVM Discuss
I revise the input name in [imagenet_test.py](https://github.com/Edgecortix-Inc/pytorch_quantization/blob/master/tvm_qnn_evaluation/imagenet_test.py) as following: ![image|690x218](upload://zZ5U3HUpfJvYzsQhgEyqfR3s6PI.png) But get the following error while execute for resnet18 model: ![image

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-08 Thread kindlehe via TVM Discuss
Yes,thanks again for your reply. I just verified [ tutorial_eager.py](https://github.com/Edgecortix-Inc/pytorch_quantization/blob/master/tutorial_eager.py) @torch-nightly(v1.6) @macbook pro, and get the 2-4x speed-up as the [static_quantization_tutorial](https://pytorch.org/tutorials/advanced

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-08 Thread kindlehe via TVM Discuss
Yes,thanks again for your reply. --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/9) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-07 Thread kindlehe via TVM Discuss
Thanks for your detailed reply very much! I will try your suggestion to try these scripts later. However, there are still some question for me: 1. Should I use `Post-training static quantization` or `Quantization-aware training` for my own model as [static_quantization_tutorial](https://pytor

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-07 Thread kindlehe via TVM Discuss
[ [Torch, QNN] Add support for quantized models via QNN #4977](https://github.com/apache/incubator-tvm/pull/4977) gives performance of quantized Torch models and converted tvm quantized model,but did not give the speed comparison betweem. Where could I get more speed compason of two kinds of q