I dropped a `print` statement into the [default AVX x86 conv2d
schedule](https://github.com/apache/tvm/blob/70884e957aa5c8de9c02c25a14d30563d7300cb9/python/tvm/topi/x86/conv2d_avx_common.py#L87),
so I know that this is the schedule that is being run.
To check if there is an int16 fallback, I
I've been exploring quantization in TVM, and one thing that I found that on the
CPU there is a special compute/schedule for running int8 conv2d on the CPU
([see
here](https://github.com/apache/tvm/blob/main/python/tvm/topi/x86/conv2d_int8.py#L132)).
From what I can tell, it seems to be pret
If your final output is incorrect, the first step I would try is to see what it
should be in the original framework.
For example, if your model is in PyTorch, pass some input data, and save the
output.
Then, export to TVM, and pass it the same input data. If the output from TVM
is different
This is a bit outwith my area of experience with TVM, however I do recall
seeing TVM had WebGL support, as discussed in [this 2018
blogpost](https://tvm.apache.org/2018/03/12/webgl).
However, [this forum discussion in 2020 discussed deprecating in favour of
WebGPU](https://discuss.tvm.apache.
You may want to make this message a reply to your [original
thread](https://discuss.tvm.apache.org/t/what-if-the-result-is-not-correct/11858/5)
to make things more coherent for other forum users.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/how-to-compare-layers-with-origin-network/
You say that the output is not correct, can you provide a reproducible example
of what model you are running, and how? Perhaps there is an issue there.
To answer your question, [this part of the
documentation](https://tvm.apache.org/docs/arch/debugger.html#how-to-use-debugger)
takes you thro
The [original TVM WASM
blogpost](https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu)
has a link to a [codebase which uses WASM for the host-code, and WebGPU for
the kernels](https://github.com/tqchen/tvm-webgpu-example).
The pipeline might have changed a li
Sure, you can see an example of how to use the `debug_executor` in [this part
of the
docs](https://tvm.apache.org/docs/arch/debugger.html#how-to-use-debugger).
The tensors will be dumped to a file, and you can then load this data using
code such as:
```python
data =
relay.load_param_dict(by
Hi there, you can use the `debug_executor` to get the intermediate results.
You can see details [discussed in this forum
thread](https://discuss.tvm.apache.org/t/tvm-runtime-print-intermediate-outputs/10124).
You will need to have a ground truth to compare against, depending on what
framework
This is exactly what I needed, thanks!
I'm now able to extract the PAPI counters from standalone functions by running
the function exported as an `.so` library in C++, with the above PAPI code!
I'll use this method to get the data I need.
Now, looking forward, I'm thinking how best to expos
Many thanks `tkonolige`, I think this is a good excuse for me to learn more
about the internals of the TVM runtime and profiling.
I've started with making a simple C++ deployment of the `matmul_add`, with the
goal of using it to implement Option 1.
I am following the basic structure of `apps/
Thanks for the reply @comaniac.
I don't need these _exact_ tensors, just to recreate them from the available
properties (e.g. their shapes, expressions of how they are generated). I'm
wondering if it's possible to recreate them from scratch at the Relay level,
just by reading the properties
I am developing in Ansor, which tunes and evaluates workloads in a model
independently, and can evaluate them standalone using the utilities in
[measure.py](https://github.com/apache/tvm/blob/main/python/tvm/auto_scheduler/measure.py).
I want to compile a single workload in a model using the d
The documentation lists that as a method of `tvm.auto_scheduler.ComputeDAG` we
can get a Python code representation of the schedule with
[`print_python_code_from_state()`](https://tvm.apache.org/docs/api/python/auto_scheduler.html?highlight=auto_scheduler#tvm.auto_scheduler.ComputeDAG.print_pyt
I am trying to understand more about the auto-scheduler, and the output from
the Ansor tuning process.
If I examine the output JSON from tuning, I see a series of transformation that
have been learned as an autoschedule, e.g.:
```
["SP", 2, 24, 3, [3], 1],
["RE", 2, [0, 4, 8, 12, 16, 1, 5, 9,
I am also curious about this. I have searched the code and the only place it
mentions these targets is in the doc-string `tvm.target.cuda` itself.
Is there any benefit to using the right GPU model? Or is it something the CUDA
compiler will figure out itself? Could this create issues for re
I've got 4 LITTLE cores and 4 big cores. In other code I've written for these
platforms, I've been able to use all 8 cores, to observe interesting behaviour.
I've looked at [this
thread](https://discuss.tvm.apache.org/t/android-big-little-core-control/2100/3),
and opinion seems to be mixed,
I was [looking for something like this a couple of months
back](https://discuss.tvm.apache.org/t/reshape-in-place-using-relay/6856), but
to avail.
It would be useful to have, I'm just unsure what changes would be needed. In a
sense we have in-place operations when we fuse conv2d+relu layers
18 matches
Mail list logo