[Apache TVM Discuss] [Development/RFC] Implementing AOT in TVM

2021-04-15 Thread Giuseppe Rossini via Apache TVM Discuss
Hi all, Thanks for the interesting discussion! So, we all agree that there are three points here: * Backend API * Calling convention * Runtime API As things stand today, memory allocation is part of the backend API. This will change with global memory planning, but for now I would tend to ski

[Apache TVM Discuss] [Development/RFC] Implementing AOT in TVM

2021-04-01 Thread Giuseppe Rossini via Apache TVM Discuss
FYI: I will be out for Easter holidays until Tuesday (so I will be replying back to any comments as soon as I come back :slight_smile: ) --- [Visit Topic](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206/15) to respond. You are receiving this because you enabled mailing list

[Apache TVM Discuss] [Development] [RFC] Standalone Code Generation and C Runtime for STM32 bare-metal devices

2021-04-01 Thread Giuseppe Rossini via Apache TVM Discuss
Also, a side comment: I will be out for Easter holidays until Tuesday (so I will be replying back to any comments as soon as I come back :slight_smile: ) --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-standalone-code-generation-and-c-runtime-for-stm32-bare-metal-devices/9562/8) to

[Apache TVM Discuss] [Development] [RFC] Standalone Code Generation and C Runtime for STM32 bare-metal devices

2021-04-01 Thread Giuseppe Rossini via Apache TVM Discuss
Hi all, I just published the AOT PR upstream: https://github.com/apache/tvm/pull/7785. It has some conflicts probably due to the `CompileEngine` refactoring, and I will fix that soon. I wanted just to let you guys start to have a look @stoa I am wondering how much of your work can use the A

[Apache TVM Discuss] [Development/RFC] Implementing AOT in TVM

2021-04-01 Thread Giuseppe Rossini via Apache TVM Discuss
Hi all, I was finally able to have a first version of the AOT work in a PR upstream. ## PR You can find the PR here: https://github.com/apache/tvm/pull/7785 At this stage, I gladly accept any feedback on things that can be improved in the PR or on issues I might have overlooked. Please, help

[Apache TVM Discuss] [Development/RFC] Implementing AOT in TVM

2021-03-04 Thread Giuseppe Rossini via Apache TVM Discuss
Hi Andrew, > for AOT runtime I agree we do not need JSON parsing or any of the underlying > facilities it brings. However, given it seems like you’re planning to reuse > the C-runtime memory allocator and interfaces in include/tvm/crt/platform.h, > I think it would be great to continue using

[Apache TVM Discuss] [Development/RFC] [RFC] A general task extraction mechanism for auto_scheduler

2020-11-12 Thread Giuseppe Rossini via Apache TVM Discuss
Hi @comaniac, May I ask how the graph ends up with a `nn.conv2d + nn.relu + nn.conv2d + nn.relu` ? Is the graph going through a BYOC kind of partitioning (sorry if the question is naive)? As for S1 vs S2, could we do both? Use an heuristic like "ignore the task without any call node" and th

[Apache TVM Discuss] [Development] Role of the LLVM autovectorizer in TVM

2020-11-06 Thread Giuseppe Rossini via Apache TVM Discuss
Hi all, I am trying to understand the role of the LLVM auto-vectorizer in TVM. Indeed, in `llvm_codegen.cc` we explicitly set: ``` builder.LoopVectorize = true; builder.SLPVectorize = true; ``` And I am trying to determine to what level TVM is relying on LLVM auto-vectorization. ### Wh

[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-30 Thread Giuseppe Rossini via Apache TVM Discuss
Maybe I am wrong, but are you sure that when `cfg.is_fallback` parameters like `cfg['tile_co']` are not defined? We usually set them to some default values (I think). But even if we don't set them, IIUC they will get "some" value among the possible ones. Am I missing something? --- [Visit

[Apache TVM Discuss] [Development/RFC] [RFC]: Improve quantized convolution through mmla instruction

2020-10-30 Thread Giuseppe Rossini via Apache TVM Discuss
cc: @anijain2305, @FrozenGene, @matt-arm, @ramana-arm --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-improve-quantized-convolution-through-mmla-instruction/8336/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click

[Apache TVM Discuss] [Development/RFC] RFC: Improve quantized convolution through mmla instructions

2020-10-30 Thread Giuseppe Rossini via Apache TVM Discuss
## Introduction and motivation This RFC is the third set of optimizations to enhance quantized convolution on Arm architectures. To give a brief summary: * Basic Armv8-A convolution implementation (through gemm): https://discuss.tvm.apache.org/t/rfc-improve-quantized-convolution-performance-f

[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-30 Thread Giuseppe Rossini via Apache TVM Discuss
Hi @FrozenGene, I think I see why we don't want to change the layout for no workload (no workload means we don't even know the strategy, I think). What I am missing is why we don't want to change the layout when `cfg.is_fallback`. In that case, the strategy is defined, so we know how the weigh

[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-29 Thread Giuseppe Rossini via Apache TVM Discuss
Hi @FrozenGene, @anijain2305 I can confirm that this works :partying_face:! Very good! Now we can implement algorithms like QNNPack and let the tuner try them together! Thanks both guys! As for the API change, I agree with @FrozenGene that maybe it would be cleaner adding `tinfos` to the `

[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-27 Thread Giuseppe Rossini via Apache TVM Discuss
I got a bit confused above, sorry. It is not about the `inputs` but about the `tinfos`. Just to avoid any additional confusion I tried to print the types of the interesting variables **conv2d_alter_op(attrs, inputs, tinfos, out_type)** ``` print(type(inputs[0])) # print(type(tinfos[0]))

[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-26 Thread Giuseppe Rossini via Apache TVM Discuss
Thanks for the reply, @FrozenGene! The signatures of the two functions are: ``` def _alter_conv2d_layout(attrs, inputs, types, out_type): ``` ``` def _qnn_conv2d_legalize_arm_cpu(attrs, inputs, types): ``` While they look similar, `inputs` in `_alter_conv2d_layout` contains actual `Tensor`s

[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-22 Thread Giuseppe Rossini via Apache TVM Discuss
cc @anijain2305 @ramana-arm @FrozenGene (we had this discussion before) --- [Visit Topic](https://discuss.tvm.apache.org/t/quantized-models-and-legalization-pass/8253/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click he

[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-22 Thread Giuseppe Rossini via Apache TVM Discuss
Hi all, I am trying to improve quantized performance for memory bound operators (e.g., depthwise or 1x1 convolutions with small shapes). ### Bottom line question Is there any way we can know the strategy picked by the autotuner during the legalization pass of a quantized convolution (qnn.co

[Apache TVM Discuss] [Development/RFC] RFC] Optionally include object file generation in tvmc

2020-10-09 Thread Giuseppe Rossini via Apache TVM Discuss
>From what I see, in `tvmc.compiler`, `export_library()` is called with a >`mod.so` input. I agree we could generate directly the `tar` file, but I think this was done to avoid storing the `.c` files (@leandron will know more than me on this). As for storing directly in the dylib, I am not

[Apache TVM Discuss] [Development/RFC] RFC] Optionally include object file generation in tvmc

2020-10-09 Thread Giuseppe Rossini via Apache TVM Discuss
Hi @tqchen, `tvmc` saves directly the `.so`, `.params` and `.json` in the the `.tar` file it generates. This happens in `tvmc/compiler.py`. I might be wrong, but probably this is because it doesn't want to store the `.c` files in the final artifact (@leandron, can you confirm this?). ---

[Apache TVM Discuss] [Development/RFC] RFC] Optionally include object file generation in tvmc

2020-10-08 Thread Giuseppe Rossini via Apache TVM Discuss
Hi @aca88, The object file produced by `tvmc` does not necessarily include the C runtime. Using a `--bare-metal` flag just refers to the fact that it is mostly useful on a bare-metal target. Anyway, to avoid confusion, I think maybe `--object-file` might be a better choice :slight_smile:

[Apache TVM Discuss] [Development/RFC] RFC] Optionally include object file generation in tvmc

2020-10-08 Thread Giuseppe Rossini via Apache TVM Discuss
cc: @leandron, @ramana-arm --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-optionally-include-object-file-generation-in-tvmc/8120/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache

[Apache TVM Discuss] [Development/RFC] RFC] Optionally include object file generation in tvmc

2020-10-08 Thread Giuseppe Rossini via Apache TVM Discuss
## Motivation Currently `tvmc` will only produce a dynamic library version of the network, i.e., an `.so` file stored alongside the other artifacts. This library is usually dynamically linked to other applications. With this change we want to add a flag to `tvmc` to get an object file (i.e.,

[Apache TVM Discuss] [Development/RFC] [RFC] Accelerate quantized convolution through dot-product

2020-09-10 Thread Giuseppe Rossini via Apache TVM Discuss
cc @anijain2305, @FrozenGene, @ramana-arm --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-accelerate-quantized-convolution-through-dot-product/7873/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://

[Apache TVM Discuss] [Development/RFC] [RFC] Accelerate quantized convolution through dot-product

2020-09-10 Thread Giuseppe Rossini via Apache TVM Discuss
## Motivation In recent RFCs we successfully boosted convolution performance on native Armv8-A architectures. When using Armv8.2-A and above ISAs, developers are provided with a richer set of instructions, among which the dot-product instruction `udot` (or `sdot`) can be particularly useful