Thanks for sharing. The failure is while calling tune_graph. The graph tuning
assumes the data to be float32.
Additionally, last time I tried, the graph tuning cant work with QNN ops. One
way to handle this is to call QnnCanonilcalize
(python/tvm/relay/qnn/transform.py) before calling graph tu
Hmm, this is weird. My script seems to work well. Is it possible for you to
share the script? If not, can you reach the printing on relay_NHWC.txt for
quantized model, or it fails before that?
---
[Visit
Topic](https://discuss.tvm.ai/t/autotvm-task-extract-from-program-in-tflite/6578/15)
[quote="alopez_13, post:7, topic:6578"]
This is part of the Relay code:
```
%0 = layout_transform(%input, src_layout="NHWC", dst_layout="NCHW");
%1 = layout_transform(%v_param_1, src_layout="HWIO", dst_layout="OIHW");
%2 = qnn.conv2d(%0, %1, 128, 122, 0.0078125f, 0.0339689f, strides=[2, 2]
Just to confirm, can you please double check your script?
We specify input shape and dtype for the model while parsing (`from_tflite`).
So, even though most of the AutoTVM script can be same, there needs to be a
small change while passing on the input shape and dtype for FP32 and quantized
mo
IIUC, simple compilation (no auto-tuning) of both FP32 and quantized models
work.
But, the auto-tuning + compilation fails for quantized model (while the same
script works for FP32), right?
---
[Visit
Topic](https://discuss.tvm.ai/t/autotvm-task-extract-from-program-in-tflite/6578/11)
t
Are you giving the right input dtypes to the model. Tflite quantized models
need `uint8` dtype.
---
[Visit
Topic](https://discuss.tvm.ai/t/autotvm-task-extract-from-program-in-tflite/6578/9)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from th
> [[topi] add ARM v8.2 udot (uint8) support
> #3978](https://github.com/apache/incubator-tvm/pull/3978)
This works if you have a machine/device with ARM v8.2 and DOT instruction.
Rasp3b and 4b don't have it.
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quan
I have mostly worked on pre-quantized models. So, I cant comment on the
performance of Relay quantized model through ARM. There might be few missing
pieces there.
I am planning to write a tutorial by next week on how to read pre-quantized
models from TFLite. You can also try @masahi tutorial
It is very difficult to estimate. Different people code with different pace.
I can share my experience, but I am not sure if you should treat it seriously.
My first task in TVM was to use Intel VNNI instructions for conv2d schedule.
This took me around a month. I am not sure, how involved QNNP
QNNPACK is for ARM, whereas VNNI instructions are for Intel. So, not exactly
that reason. But, the underlying statement might still be the case, that we
dont have good TVM schedules.
Regarding schedules to get same speedup as QNNPACK, we can write assembly
implementation in TVM schedule and
Yes, that seems plausible. Please note that one might also make FP32 schedule
better by working on low-level optimizations :) So, it is relative.
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/36)
to respond.
You are receiving this be
Yeah, the work by AliOS is not available yet. They worked a lot on very
low-level optimizations. Over time, this work will hopefully be upstreamed. For
now, on master, QNNPACK is faster.
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/3
Yes, thats the selling point of TVM.
TVM community works together on these TVM schedules. As we get more people
interested in quantization, we can add more TVM schedules, for e.g., avx2
machine you are talking about. We dont want to fully rely on FBGEMM or QNNPACK,
because it might cause conf
For rasp3 and rasp4, we saw 1.3x - 1.5x performance speedup going from FP32 to
Int8.
The link comparing QNNPACK and TVM is not upstream'd yet. If I understand
correctly, it will be sometime before the authors of that work will be able to
make it to upstream. There are some differences in unde
@kindlehe TVM might not be optimized for target 'llvm -mcpu=core-avx2'. I would
suggest running it on CascadeLake. You would see major benefit.
For rasp4, if you are comparing FP32 vs Int8, yes I have seen performance
improvements. However, if you compare PyTorch (backed by QNNPACK) int8 vs TV
You are correct. I forgot about PyTorch frontend for quantizing. This is true
for MXNet as well.
We can also make a tutorial for all frameworks. you can take care of PyTorch, I
can take care of MXNet (similar to PyTorch) and TFLite (easy). It can be just
one tutorial with different sections
Thanks @kindlehe @masahi
Masa explained it correctly. For a long time, processors had higher FP32
throughput than Int8 throughput.So, it is not fair to assume that
quantization will give you performance benefits on all the machines. Check
Intel VNNI, Nvidia DP4A and tensor cores, and ARM
Thank you. Can you also register for fast_tanh?
Also, a better usage for using fastmath pass is follows
https://github.com/apache/incubator-tvm/blob/a5d7bdab8771430be052c22d07ebe2df6b320be4/tests/python/relay/test_pass_fast_math.py#L32-L33
---
[Visit Topic](https://discuss.tvm.ai/t/relay-o
Thanks for sharing your thoughts.
Let me share some more background. To achieve high performance for compute
heavy ops (close to hand-written kernels like MKLDNN or ACL), we need to
perform vector register tiling. This is one more level lower than cache tiling.
Here, we have to carefully craf
I have been working on TVM schedules for ARM. One thing that I notice is that
LLVM has its own unrolling heuristics, that can completely mess up the analysis
that one does for unrolling in TVM.
For example, a developer can choose to unroll a particular axis with the goal
of better reuse utili
20 matches
Mail list logo