Hello,
Which OpenCL version is required by TVM's OpenCL backend?
I could not find any information in the TVM documentation.
I tried OpenCL 1.2 drivers, but apparently the headers are not matching.
The target is an ODroid XU4 using an Exynos SoC with a Mali T628.
---
[Visit
Topic](https://
I did not encounter this problem
What does your config.cmake file look like? especially the PAPI line?
Does PAPI work, if you run the tests or the binaries like `papi_native_avail`?
did you compile PAPI with `./configure --prefix=""
--with-components="cuda"`?
have you tried running `make cle
Hi @tkonolige ,
thanks to your help, I was able to measure data on a range of different devices.
But there might be a problem with the CMake script, if PAPI is not installed on
a standard location:
I tried using this setup on a cluster environment, where PAPI was installed in
my home director
Hi @tkonolige ,
during the build process the papi.h header file, which is part of the PAPI
installation could not be found.
I added `include_directories(${USE_PAPI}/../../include)` as a workaround, which
seemed to be working fine for now.
---
[Visit
Topic](https://discuss.tvm.apache.org/t
Hey @tkonolige ,
can you give me a hint on how to build your PR?
I build PAPI for my targets before pulling it and
set the flag USE_PAPI ON in the config.cmake, but am not sure on how to use it
to collect power consumption data with NVML or CUDA on Nvidia GPUs.
Thanks in advance :slight_smil
Hi,
I tried to follow to current development around µTVM and AOT compilation, but
still quite a bit confused, as it seems like the AOT will replace the efforts
around µTVM's runtime?
Will the µTVM runtime development be continued? If I understood correctly,
according to the RFC for AOT ther
Ah, yes, thank you. I am not sure, what I did wrong on the first try, but now
it is working.
Thank you very much :slight_smile:
---
[Visit
Topic](https://discuss.tvm.apache.org/t/measure-memory-allocation-using-debug-executor/9679/8)
to respond.
You are receiving this because you enable
Hi @manupa-arm ,
Thank you for this work :slight_smile:
I tried to access this data after compiling the GraphExecutorFactory, but am a
bit confused by the output:
The output of function_metadata is a JSON, which seems to represent a
dictionary, but the values for each function look like thi
I tried to correspond the output of relay.analysis.extract_fused_functions with
their execution times measured by the debug executor, but was not able to find
a way to match these two sets.
I already asked for help a couple of weeks ago, but did not know about
extract_fused_functions at this
I tried to get information about which functions have been fused in my compiled
IRModule.
The network is a MXNet version of ResNet-18 and has been compiled with
relay.build_module.build(mod, target, params=params).
When I list the functions of the IRModule, I get only a single function with
I am bit confused,
maybe I misunderstood your suggestion.
I am using the debug executor to measure the latency of the individual (fused)
TIRfunctions,
but I cannot tell which function corresponds to which part of the
original/optimized relay graph.
(example of TIR function name: fused_layout_t
For a project, I want to train a number of models that can predict the
execution time of a layer (from its relay description) on different hardware
targets.
My current problem is, that I am unable to find a nice option to do this.
The Debug Runtime measures the execution time for the low level
You might want to look into the BYOC flow.
[TVM Blog - Bring Your Own
Codegeneration](https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm)
It looks like a perfect solution for your task. You most likely need to do
three things:
1. Define, which subgraphs and nodes need to b
first composite, afterwards annotate.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/byoc-multi-layer-subgraphs/8013/5) to
respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsub
def @main(%input_1: Tensor[(1, 224, 224, 3), float32]) -> Tensor[(1, 1000),
float32] {
%69 = nn.pad(%input_1, pad_width=[[0, 0], [0, 1], [0, 1], [0, 0]]) /*
ty=Tensor[(1, 225, 225, 3), float32] */;
%70 = multiply(%69, 16f /* ty=float32 */) /* ty=Tensor[(1, 225, 225, 3),
float32
Hello,
I've got a new question about the BYOC flow.
In my current implementation "annotation.stop_fusion" instructions are added to
the partitioned relay
description of the network, usually separating individual network nodes.
Is there a way to disable this behaviour, to enable passing multi-l
Hi, it looks like I am almost done with implementing the BYOC-runtime for my
target.
But in the Run-Function, which is called by the PackedFunction returned by
GetFunction(),
it looks like the outputs of the subgraph need to be written back to the
DLTensor.
But I am not sure, which form/layou
Ah, I guess, I found my issue.
I forgot to implement the Run function.
Weight tensors are initialized inside the Init function, but inputs at Run.
Thank you very much
---
[Visit
Topic](https://discuss.tvm.apache.org/t/byoc-runtime-access-tensor-values/7802/7)
to respond.
You are receivi
Ok, thank you, that works, but for some reason only for constant tensors.
The input tensors still have a nullpointer instead of the data field.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/byoc-runtime-access-tensor-values/7802/4)
to respond.
You are receiving this because you enab
19 matches
Mail list logo