I was trying to to compile a tensor expression code for an opencl target, and
while inspecting the kernel generate by using the imported_modules attribute of
the build variable there seemed to be a lot of arguments called stride1,
stride2 and so on being passed to the kernel. Is there a way to
Hey I was trying to execute the functions given in topi
(https://tvm.apache.org/docs/api/python/topi.html) using opencl target but only
the ceil and floor operations seem to work the rest throws an error in tvm.
Here is my code.
> import tvm
>
> from tvm import te, topi
>
> import numpy as n
You're right. Graph tuner only supports CPU.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/graph-optimization-not-exist-in-gpu/8051/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.apache
This might be a pretty naive question, but I was wondering why the graph
optimization (tune_graph with graph_tuner) exists in the [x86
tutorial](https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html)
but is missing in the [CUDA
example](https://tvm.apache.org/docs/tutorials/autotv
This function calculates the total size of a config space:
https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/task/space.py#L838
And this function goes through each config in the config space:
https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/task/space
Have a question about config space in autotvm, have searched in the script
about the calculation for configspace but have not found anything, If anyone
has some ideas on how this is calculated, it could help me.
For example, we can take the Auto-tuning a convolutional network for x86 CPU
here
Hi [tqchen](https://discuss.tvm.apache.org/u/tqchen),
Thank you for your reply!
We are curently using the tvm v0.6. So is there any way to reduce the memory
usage after building a operator for tvm v0.6? Can i call a method to free the
memory? Thank you in advance!
BTW, I have tested these tw
I'm not particularly familiar with `annotation.stop_fusion`, other than that it
seems like something that is introduced to block the FuseOps pass. My naive
solution here would be to register `annotation.stop_fusion` as a supported
operator for your codegen. You can then later remove/ignore it.
I wonder is it possible for TVM to support CUDA warp-level sync operations? For
example, if I want to use shuffle intrinsics, what should I do? If not
possible, then I have to use shared memory. But then TVM will generate
syncthreads, which is an overkill. If I load and consume shared memory o