Very appreciate @hjiang, that’s the key point. Your recommendation is my top
consideration to deep into study. Here is another issue to understand why there
is two dependence queues(fifo) between modules? In the perspective of software
multi threads coding, that require only one queue which is
This problem was solved by linking cuda and cuda_runtime libraries when I build
gotvm.
---
[Visit Topic](https://discuss.tvm.ai/t/gotvm-make-error-with-cuda/7765/2) to
respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](h
[quote="Dileep, post:7, topic:7681"]
i set TRACKER_IP as a IP address of the remote device ( ip address of the iOS
device ), am i right ?
[/quote]
No, the tracker should run on the host machine. I think you also need to start
the tracker with the following command:
```
python -m tvm.exec.rpc_
Yes, if you really want to improve, you need to analyze deeper. Like what kind
of instruction effects lower performance then you should try to avoid it (Like
using tensorize). I think your current performance is good enough now.
---
[Visit
Topic](https://discuss.tvm.ai/t/how-to-further-im
@FrozenGene As for the cpu efficiency,

I noticed that vtune reported above 93% cpu utilization. (I only use single
thread single cpu core.)
Does it mean there is not much room for improvement?
---
[Visit
Topic](https://discuss.t
CPI rate is a little high. One reason is maybe we generate too many redundancy
instructions. So tensorize GEMM core part maybe is one solution. As you have
performed better than oneDNN, you could compute the efficiency of CPU (like
60%, 70% or ...), if you have reached like 98% efficiency, yo
I get the following error when executing simple code targeted at opencl. This
is after I have installed a new opencl platform called pocl. Furthermore this
error occurs even I if just use llvm target and cpu I cant figure out why ?.

Hi everyone, I've been working for a couple of months on the TVM stack and I
really like it :slight_smile:
I have a question related to the use of TVM 0.7 APIs, in particular
`tvm.tir.transform.Simplify`.
In TVM 0.6 I could simply call `tvm.tir.ir_pass.Simplify(stmt)` in any of my
custom I
Hi Paddi,
Good to know have interest with VTA and TLPP, about your question
load/compute/store are serialized, I think you may means the logic in
unit test function VTA(), the VTA(...) actually is not get involved when
FPGA do real compute and such function only work when do unit test,
For t

---
[Visit
Topic](https://discuss.tvm.ai/t/how-to-further-improve-the-performance-of-given-schedule/7711/5)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here]
Hi @kazum , thank you for the response.
* I ran rpc_proxy on the host machine with below command
> python -m tvm.exec.rpc_proxy --host [HOST_IP] --tracker [TRACKER_IP]:9190
HOST_IP = IP address of host machine , TRACKER_IP = IP address of remote device
(iOS device IP)
* Tried to connect the i
@FrozenGene @tqchen Thanks for your advices.
I have written my own schedule and autotvm template. I also tried Intel OneDNN
according to the BYOC tutorial. Currently I have outperformed OneDNN by about
0.4 ms on single cpu core.
I tried to perf it with VTune and collected hotspots report.
![im
For Intel x86 target, firstly, we should read the doc :
https://tvm.apache.org/docs/tutorials/optimize/opt_gemm.html, which covers
important aspects of tvm schedule primitives and its effect. Secondly,
recommend to reading
https://tvm.apache.org/docs/tutorials/autotvm/tune_simple_template.htm
我最近阅读了tune_relay_arm.py中有关autotvm的以下代码。
prefix =“ [[Task%2d /%2d]]%(i +
1,len(tasks)),但不理解这行代码的意思和写法,希望有人能帮助一下,谢谢您的回答,一起进步呀。
---
[Visit Topic](https://discuss.tvm.ai/t/autotvm-code-answer/7770/1) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from
@Dileep Sorry for the late response. I've tried auto tuning on iOS device
before and needed some hacks to make it work.
- Pass a customized build function to LocalBuilder to compile your model with
Xcode.
- Modify autotvm.measure.measure_methods.check_remote to make it return True
always. It
hi
I am new learner to TVM and VTA. What confuse me is how parallelism run? I may
get the idea of it with virtual and block diagram in VTA paper, but the code in
vta-hw/src confuse me, as in that code, module load, compute and store are
serialized. What is my problem? I will appreciate if any
I have dug into this further and now I understand why there is no asynchronous
memory access. TVM was made with GPU in mind (for OpenCL) and GPU change warp
if the active one stall due to memory access.
While this is completely justified for CUDA, I think there should be
asynchronous memory
Having the same issue. Any update?
---
[Visit
Topic](https://discuss.tvm.ai/t/deploy-the-pretrained-model-on-raspberry-pi-error/7077/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/u
18 matches
Mail list logo