I am trying to run the following code on my macbook pro, but do not know how to
set the value for "target". I built tvm from source code, downloaded latest
version of llvm for mac, which appears built ok, but cuda does not work because
I could not find a cuda driver for mac. appreciate your he
Thanks for the advice. haichen!
I think i should try optimizing winograd using autotvm.
Thank you.
---
[Visit
Topic](https://discuss.tvm.ai/t/topi-winograd-convolution-performance-is-too-slow/6161/8)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscrib
I got this message "cmake error at cmake/modules/CUDA.cmake:29, cannot find
cuda ,use_cuda=on" in the cmake process, what could be wrang? I am using
nvidia/cuda:10.0-base-ubuntu16.04 image of docker.
---
[Visit Topic](https://discuss.tvm.ai/t/cannot-find-cuda-use-cuda-on/6180/1) to
respon
I didn't compare these two implementation using fallback config before. But
based on your observation, I think it's the case that winograd fallback config
does perform poorly.
---
[Visit
Topic](https://discuss.tvm.ai/t/topi-winograd-convolution-performance-is-too-slow/6161/7)
to respond.
I have deployed some models in RK 3288, which completed in for example 5ms. But
I found that the TVM threads seems consume CPU continuously after the model
inference completed. I tested by the following code:
tvm::runtime::Module mod =
(*tvm::runtime::Registry::Get("tvm.graph_runtime.crea
thank you for the reply haichen! i will try it.
I'm curious that the two convolutions are using fallback configuration not
tunned by autoTVM.
However, winograd is slow, with almost 200 times the performance difference. In
general, does winograd perform poorly before tunning?
---
[Visit
T
Yes that's the one for your question. Could you mind changing the title with
[Solved] if the document addresses your question?
---
[Visit
Topic](https://discuss.tvm.ai/t/solved-relationship-between-strategy-compute-schedule/6175/3)
to respond.
You are receiving this because you enabled m
Ah, I find this in document: https://docs.tvm.ai/dev/relay_op_strategy.html
---
[Visit
Topic](https://discuss.tvm.ai/t/relationship-between-strategy-compute-schedule/6175/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
I started diving a bit deep into the process from Relay to TVM IR. The
**strategy** is a completely new notion popped up.
All tutorials from TVM document site are focusing on **compute** and
**schedule**. My understanding is **compute** defines WHAT while **schedule**
defines HOW or WHEN.
Correct. You can tweak the schedule to change the launch config, but as a user
you shouldn't care about the exact size of grid/block.
If you really want the best perf, use autotvm to tune your schedule, and the
resulting grid/block size is optimal based on real measurament.
---
[Visit
T
Hi:
Thank you for your help!
So, based on my understanding for these codes.
in python
```
func(a,b,c)
```
will call this
```
void operator() (TVMArgs args,
TVMRetValue* rv,
void** void_args) const
```
And grid_dim, block_dim are inferred from **TVMArgs args**(
@tqchen Is that possible for you to run clang-format for an entire code base so
that we can add a checker to CI? If we have concerns to the correctness issues
potentially could be introduced by clang-format, we might be able to assign few
people to do so?
---
[Visit
Topic](https://discus
A related PR with more discussion
https://github.com/apache/incubator-tvm/pull/5202
---
[Visit
Topic](https://discuss.tvm.ai/t/ci-lint-enabling-clang-format-based-lint-checks/6170/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emai
I have experienced that a considerable number of review cycles are spent fixing
code-style issues that escapes the current linter. E.g., pointer formatting --
int* var vs int *var. Moreover, we have a .clang-format in TVM. I was
wondering, Could there be some work done incoporating these two t
The answer is we use CUDA driver API to launch kernels from C++ code.
```kernel<<>>(a,b,c)``` is not the only way to launch kernel
and it requires compiling with NVCC.
See
https://github.com/apache/incubator-tvm/blob/e0122c0ea68043372220e4e02b81692c34832227/src/runtime/cuda/cuda_module.cc#L1
Have you used AutoTVM to tune the winograd template? The default schedule could
be slow.
---
[Visit
Topic](https://discuss.tvm.ai/t/topi-winograd-convolution-performance-is-too-slow/6161/5)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from the
Experiencing the same problem here. The tutorial on [implementing a pass using
python
decorators](https://docs.tvm.ai/tutorials/dev/relay_pass_infra.html#implement-a-pass-using-python-decorator)
does not seem to work. While `transform_function` is being called, its
wrapping `visit_const` is n
Hello!
I am currently implementing Winograd conv2d through Relay.
data_shape = (1,64,224,224)
w_shape= (64,64,3,3)
input_data = relay.var('input', relay.TensorType(data_shape, "float32") )
p1 = relay.var('wgt', relay.TensorType(w_shape, "float32") )
FIR = relay.nn.contrib_
BTW, I am also wondering if TVM stack supports CUDA streaming features like
(https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/)
---
[Visit
Topic](https://discuss.tvm.ai/t/how-cuda-kernel-is-launched-in-tvm-stack/6167/2)
to respond.
You are receiving this because
Hi all:
I am learning the TVM CUDA backend. I have a question about how CUDA kernel is
launched.
Below is my simple test program:
```
import tvm
from tvm import te
import numpy as np
dtype = "float32"
# GEMM size
M=16;K=8;N=16
# declear algorithm
k = te.reduce_axis((0, K), 'k') # loop over d
code:
graph, lib, params = relay.build(func, target, params=params)
As much as i know, if i want to run my pre-trained model remotely, I have to
save those 3 outputs into 3 files.
But all 3 files should always be used together, right? If I have multi models,
it's a little bit difficult to main
i conducted additional experiments.
When using Conv 1.2 layer of VGG-16 network, according to the
[paper](https://arxiv.org/abs/1509.09308), performance should be better than
direct conv2d.
But the result is that direct conv2d is better.
input_img shape = (1,64,224,224) ## NCHW Format
22 matches
Mail list logo