Hi all,
Thanks for the interesting discussion! So, we all agree that there are three
points here:
* Backend API
* Calling convention
* Runtime API
As things stand today, memory allocation is part of the backend API. This will
change with global memory planning, but for now I would tend to ski
FYI: I will be out for Easter holidays until Tuesday (so I will be replying
back to any comments as soon as I come back :slight_smile: )
---
[Visit Topic](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206/15)
to respond.
You are receiving this because you enabled mailing list
Also, a side comment: I will be out for Easter holidays until Tuesday (so I
will be replying back to any comments as soon as I come back :slight_smile: )
---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-standalone-code-generation-and-c-runtime-for-stm32-bare-metal-devices/9562/8)
to
Hi all,
I just published the AOT PR upstream: https://github.com/apache/tvm/pull/7785.
It has some conflicts probably due to the `CompileEngine` refactoring, and I
will fix that soon. I wanted just to let you guys start to have a look
@stoa I am wondering how much of your work can use the A
Hi all,
I was finally able to have a first version of the AOT work in a PR upstream.
## PR
You can find the PR here: https://github.com/apache/tvm/pull/7785
At this stage, I gladly accept any feedback on things that can be improved in
the PR or on issues I might have overlooked. Please, help
Hi Andrew,
> for AOT runtime I agree we do not need JSON parsing or any of the underlying
> facilities it brings. However, given it seems like you’re planning to reuse
> the C-runtime memory allocator and interfaces in include/tvm/crt/platform.h,
> I think it would be great to continue using
Hi @comaniac,
May I ask how the graph ends up with a `nn.conv2d + nn.relu + nn.conv2d +
nn.relu` ? Is the graph going through a BYOC kind of partitioning (sorry if the
question is naive)?
As for S1 vs S2, could we do both? Use an heuristic like "ignore the task
without any call node" and th
Hi all,
I am trying to understand the role of the LLVM auto-vectorizer in TVM. Indeed,
in `llvm_codegen.cc` we explicitly set:
```
builder.LoopVectorize = true;
builder.SLPVectorize = true;
```
And I am trying to determine to what level TVM is relying on LLVM
auto-vectorization.
### Wh
Maybe I am wrong, but are you sure that when `cfg.is_fallback` parameters like
`cfg['tile_co']` are not defined? We usually set them to some default values (I
think). But even if we don't set them, IIUC they will get "some" value among
the possible ones. Am I missing something?
---
[Visit
cc: @anijain2305, @FrozenGene, @matt-arm, @ramana-arm
---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-improve-quantized-convolution-through-mmla-instruction/8336/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
## Introduction and motivation
This RFC is the third set of optimizations to enhance quantized convolution on
Arm architectures. To give a brief summary:
* Basic Armv8-A convolution implementation (through gemm):
https://discuss.tvm.apache.org/t/rfc-improve-quantized-convolution-performance-f
Hi @FrozenGene,
I think I see why we don't want to change the layout for no workload (no
workload means we don't even know the strategy, I think). What I am missing is
why we don't want to change the layout when `cfg.is_fallback`. In that case,
the strategy is defined, so we know how the weigh
Hi @FrozenGene, @anijain2305
I can confirm that this works :partying_face:! Very good! Now we can implement
algorithms like QNNPack and let the tuner try them together! Thanks both guys!
As for the API change, I agree with @FrozenGene that maybe it would be cleaner
adding `tinfos` to the `
I got a bit confused above, sorry. It is not about the `inputs` but about the
`tinfos`.
Just to avoid any additional confusion I tried to print the types of the
interesting variables
**conv2d_alter_op(attrs, inputs, tinfos, out_type)**
```
print(type(inputs[0]))
#
print(type(tinfos[0]))
Thanks for the reply, @FrozenGene!
The signatures of the two functions are:
```
def _alter_conv2d_layout(attrs, inputs, types, out_type):
```
```
def _qnn_conv2d_legalize_arm_cpu(attrs, inputs, types):
```
While they look similar, `inputs` in `_alter_conv2d_layout` contains actual
`Tensor`s
cc @anijain2305 @ramana-arm @FrozenGene (we had this discussion before)
---
[Visit
Topic](https://discuss.tvm.apache.org/t/quantized-models-and-legalization-pass/8253/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
he
Hi all,
I am trying to improve quantized performance for memory bound operators (e.g.,
depthwise or 1x1 convolutions with small shapes).
### Bottom line question
Is there any way we can know the strategy picked by the autotuner during the
legalization pass of a quantized convolution (qnn.co
>From what I see, in `tvmc.compiler`, `export_library()` is called with a
>`mod.so` input.
I agree we could generate directly the `tar` file, but I think this was done to
avoid storing the `.c` files (@leandron will know more than me on this).
As for storing directly in the dylib, I am not
Hi @tqchen,
`tvmc` saves directly the `.so`, `.params` and `.json` in the the `.tar` file
it generates. This happens in `tvmc/compiler.py`. I might be wrong, but
probably this is because it doesn't want to store the `.c` files in the final
artifact (@leandron, can you confirm this?).
---
Hi @aca88,
The object file produced by `tvmc` does not necessarily include the C runtime.
Using a `--bare-metal` flag just refers to the fact that it is mostly useful on
a bare-metal target.
Anyway, to avoid confusion, I think maybe `--object-file` might be a better
choice :slight_smile:
cc: @leandron, @ramana-arm
---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-optionally-include-object-file-generation-in-tvmc/8120/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.apache
## Motivation
Currently `tvmc` will only produce a dynamic library version of the network,
i.e., an `.so` file stored alongside the other artifacts. This library is
usually dynamically linked to other applications.
With this change we want to add a flag to `tvmc` to get an object file (i.e.,
cc @anijain2305, @FrozenGene, @ramana-arm
---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-accelerate-quantized-convolution-through-dot-product/7873/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://
## Motivation
In recent RFCs we successfully boosted convolution performance on native
Armv8-A architectures. When using Armv8.2-A and above ISAs, developers are
provided with a richer set of instructions, among which the dot-product
instruction `udot` (or `sdot`) can be particularly useful
24 matches
Mail list logo