Hello.
In rk3399, i found a performance decrease during inference using the vgg-16
model.
Performance was measured using the test code below.
import tvm
import tvm.relay as relay
from tvm.contrib import graph_runtime
import numpy as np
import topi
from tvm.relay.testin
Hi, I meet the same problem, have you solved it now?
---
[Visit Topic](https://discuss.tvm.ai/t/scan-on-a-non-zero-axis/5996/2) to
respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscrib
Thank you for your reply.
[quote="haichen, post:10, topic:6161"]
The strategy to select implementations for `conv2d` op is defined at [here
](https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/op/strategy/cuda.py#L91-L198)
[/quote]
Which function in AutoTVM will use this strat
Thanks for your detailed reply very much!
I will try your suggestion to try these scripts later.
However, there are still some question for me:
1. Should I use `Post-training static quantization` or `Quantization-aware
training` for my own model as
[static_quantization_tutorial](https://pytor
@adb The PR is up https://github.com/apache/incubator-tvm/pull/5272
---
[Visit
Topic](https://discuss.tvm.ai/t/external-codegen-status-of-annotating-composite-functions/6150/9)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [cl
Thanks for the explaination.
---
[Visit Topic](https://discuss.tvm.ai/t/when-to-use-reference-vs-pointer/6261/3)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/6c92330aab7be
The implementations for CUDA are defined in `topi/python/topi/cuda`. The
strategy to select implementations for `conv2d` op is defined at
[here](https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/op/strategy/cuda.py#L91-L198).
I don't understand your second question very well
We should pass by the ObjectRef in most part of the codebase. The only
exception for now is the Functor dispatching classes, where the first argument
is the Object node class itself, and can be viewed as a Weak reference to the
original node.
There are some interest in moving the functor disp
Hi,
Here is an example
```
Expr Rewrite_(const CallNode* call_node, const Expr& post) final {
const Call& ref_call = GetRef(call_node);
...
...
const auto* post_node = post.as();
```
This is a fairly standard convention in TVM C++ codebase. I am really confused
about when we should p
You are correct. I forgot about PyTorch frontend for quantizing. This is true
for MXNet as well.
We can also make a tutorial for all frameworks. you can take care of PyTorch, I
can take care of MXNet (similar to PyTorch) and TFLite (easy). It can be just
one tutorial with different sections
Ah, I get it.
All compiling process starts with relay will call into relay.build(...), which
will go through what I called the "normal build flow that starts with
high-level optimizations. The process is followed up with low level
optimizations mainly at TOPI level.
The VTA calls vta.build
[quote="anijain2305, post:4, topic:6256"]
do not see tutorial to be very different from FP32 compilation
[/quote]
Yes, for tflite where you can just download pre-quantized model from their zoo,
I don't think it would be different from fp32. For PyTorch it is a bit more
complicated :) All the b
I see a path on tvm normal build path side:
**tvm/python/tvm/relay/build_module.py** -->
**tvm/src/relay/backend/build_module.cc** Lower(...)--> LowerInternal(...) -->
**tvm/python/tvm/relay/backend/_backend.py** lower(...) -->
**tvm/python/tvm/driver/build_module.py** lower(...)
The last one
Thanks @kindlehe @masahi
Masa explained it correctly. For a long time, processors had higher FP32
throughput than Int8 throughput.So, it is not fair to assume that
quantization will give you performance benefits on all the machines. Check
Intel VNNI, Nvidia DP4A and tensor cores, and ARM
Thanks @matt-arm! Will be trying out some patterns today.
---
[Visit
Topic](https://discuss.tvm.ai/t/external-codegen-status-of-annotating-composite-functions/6150/8)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here]
Hi @masahi thanks for bringing this to my attention. Looks like this PR could
work for us too. As a first pass we hope to target the most common fusion
patterns as in your PR.
---
[Visit
Topic](https://discuss.tvm.ai/t/external-codegen-status-of-annotating-composite-functions/6150/7)
to
This is very confusing while I started reading the TVM source code, trying to
figure out the build paths.
Normal build flow seems using **tvm/python/tvm/relay/build_module.py**, which
itself is a wrapper for C++ implementations under the hood, such as
**tvm/src/relay/backend/build_module.cc*
[quote="kindlehe, post:1, topic:6256"]
but it is very hard to find official tutorial about how to do quantization for
pytorch or tf correctiy
[/quote]
Yes, this is a good point. @anijain2305 do we have a plan to send a tutorial
for how to convert from pre-quantized models?
---
[Visit
Top
Yes, without HW support for int8, you shouldn't expect int8 to be any faster
than fp32. For avx2, Torch is much faster than TVM for int8. For avx512, where
int8 does make a difference, TVM is much faster.
I have a script
https://github.com/Edgecortix-Inc/pytorch_quantization/tree/master/tvm_
@adb I had a old PR https://github.com/apache/incubator-tvm/pull/4741 which
demonstrates conv + bias + relu fusion in the "hard" way (before composite was
introduced). I'll send a new one ASAP after PRs by @matt-arm are merged.
---
[Visit
Topic](https://discuss.tvm.ai/t/external-codegen-s
ValueError: don't know how to convert type to object
---
[Visit
Topic](https://discuss.tvm.ai/t/valueerror-dont-know-how-to-convert-type-class-torch-tensor-to-object/6258/1)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [clic
[ [Torch, QNN] Add support for quantized models via QNN
#4977](https://github.com/apache/incubator-tvm/pull/4977) gives performance of
quantized Torch models and converted tvm quantized model,but did not give the
speed comparison betweem.
Where could I get more speed compason of two kinds of q
[python.onnx ] save model
%122 = take(%121, %v635, axis=0);
%123 = expand_dims(%122, axis=0);
%124 = shape_of(%blocks.1.0.weight, dtype="int64");
%125 = take(%124, %v647, axis=0);
%126 = expand_dims(%125, axis=0);
%127 = reshape(%120, newshape=[1, 4096, 4, 4]);
%128 =
%105 = add(%104, 1f);
%106 = reshape(%105, newshape=[8, 1, 512, 1, 1]);
%107 = multiply(%3, %106);
%108 = full(1, shape=[], dtype="float32");
%109 = power(%107, 2f);
%110 = sum(%109);
%111 = add(%110, 1e-08f);
%112 = power(%111, 0.5f);
%113 = divide(%108, %112);
%114 = multiply
Status update! I've put on the following two PRs which hopefully will allow for
composite function annotation:
[5261](https://github.com/apache/incubator-tvm/pull/5261),
[5262](https://github.com/apache/incubator-tvm/pull/5262). Feel free to take a
look.
---
[Visit
Topic](https://discus
Thank you very much for your help. I am gonna give it a try using the
GraphRuntime C++ API :slight_smile:
---
[Visit
Topic](https://discuss.tvm.ai/t/how-can-i-deploy-a-relay-module-conv2d-for-c-use/5942/6)
to respond.
You are receiving this because you enabled mailing list mode.
To unsu
Now the souce code of vectorize for OpenCL looks like:
```
vstore2((vload2(0, ( half*)compute + (ff * 2)) + (vload2(0,
pad_temp_shared_local_local + 0) * ((half2)(input1_shared_local_local[0],
input1_shared_local_local[0], 0, ( half*)compute + (ff * 2));
```
but i want something like:
```
@zhiics done! It's [here](https://github.com/apache/incubator-tvm/pull/5259).
---
[Visit
Topic](https://discuss.tvm.ai/t/custom-pass-is-not-working-from-tutorial/5549/7)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
he
28 matches
Mail list logo