Hello Andrew @areusch

>In my mind, some setup function is needed to accomplish:
>
> 1. initializing memory set aside for tensors and parameters
> 2. configuring accelerators, including starting (possibly) backgrounded 
> transfers of any programming/parameters.
>
> I think that the TVM function for this is the factory function (right now, 
> typically mod["default"]()), and the X-Cube equivalent is 
> ai_[<model_name>_]create. Does that match your understanding?

This is exact.

>Apologies, I think I was a bit confused before. IIUC, I think this port aims 
>to implement an API aligned with the X-Cube API, at least for now only aiming 
>to enable deployments to STM32–does that also seem right to you? I’m curious 
>whether this API aims to replace the C runtime and Model-based Module Runtime 
>Interface for all targets or if this would just be confined to STM32 for now.

;-) If I am ambitious, I would say replace for **a family of embedded 
targets**. 
Sorry, I perhaps, was not clear earlier.
We have observed several embedded tools converged on such API:
- [X-CUBE-AI](https://www.st.com/en/embedded-software/x-cube-ai.html), of course
- [TensorFlow Lite for 
Microcontrollers](https://www.tensorflow.org/lite/microcontrollers)
- NXP eIQ-GLOW [AOT NN 
Compiler](https://www.nxp.com/docs/en/user-guide/EIQGLOWAOTUG.pdf)

That seems a good argument to try also aligning the TVM C API in this direction.
We probably need to change the naming, perhaps have `tvm_ai_` instead of
just `ai_` - this is a detail. Important point is that there is a dozen of
methods common to the above APIs and that the memory management is left to
the main application to handle.
I propose to start with the STM32 code emitter now and work together with the 
TIR-based AoT
on converging to a common understanding. This will pave the way for us to move
to the TIR-based code generator. We can perhaps also contribute to its 
development.

> Then the next questions I have would be around how you’d like to proceed with 
> this going forward. At present, the STM32 generator PR you’ve proposed has 
> several features that are missing from the microTVM compiler (e.g. memory 
> pinning, AOT, etc). As we implement these features, will it be possible to 
> incorporate them into this generator as well (I.e. to take advantage of 
> compiler-level improvements we might be able to make, such as graph-level 
> optimization)?

This would be the plan. I can imagine a couple of things we can do with the
TIR-based AoT that we cannot with our current code emitter.

> If so, it would be great to keep the STM32 API semantically similar to the 
> TVM C runtime API, so that we can later invoke TVM C runtime APIs from the 
> STM32 functions. I suspect these are pretty similar, but just want to 
> understand the goals for code-reviewing your PR. One possible scenario is: 
> when we have a TVM AOT runtime and memory pinning available, we could rework 
> ai_create to instantiate the TVM C AOT runtime. It would also be great to use 
> the STM32 API as inspiration to expand the TVM APIs to provide equivalent 
> functionality. Please let me know your thoughts here!

Corresponds entirely to our vision. Great !

> So my question here is: in the future, woudl you be open to using a TVM-side 
> implementation of a memory-pool, statically-allocated memory planner? I think 
> it sounds like that’d be okay, but just confirming.

Yes. We will move away from the JSON graph and base the code emission on the
TIR-based TVM structures, including the memory planner.

> When we do tensor pinning, I think it’s likely I’ll propose to add some 
> tensor_id (note: different from storage_id, as storage_id could contain 
> multiple tensor_id) to TVMBackendAllocWorkspace, and a lookup table could 
> just return a pointer into the pre-allocated memory pool. 
> TVMBackendFreeWorkspace would become a no-op. Will that work for you guys?

That is good. Just keep in mind that these memory pools should be open to a
static allocation as a section via a link script, to a static allocation as
a table from the main application (.data), and to the dynamic allocation via
whatever allocator the application may choose.

> - consider removing the need to use PackedFunc looked-up by string name, and 
> instead provide more natural C wrappers around those functions

Already the case.

> - consider creating a mapping from PackedFunc string name to a global symbol 
> name to shortcut this lookup, as they won’t likely be dynamically overridden 
> in embedded applications.

We will add a API method for such lookup implementing the mapping.

> Would it be possible to checkin a docker container e.g. tlcpack/ci-stm32 
> which could run this in our CI? Then we can just make it a first-class 
> example and place in apps/microtvm/stm32 or a similar sub-directory of 
> microtvm of your choosing.

Yes. Noted.

The Module Library Format seems not fully finalized yet ;-)
That's fine. I will generate the structure as per your RFC proposal (no crt),
and we can refine it from there. This is a minor detail.

### Actions for us:

Re-submit the PR with this:

1. Move to generating Module Library Format (as it is for now).
2. Provide the docker and a test application for the sanity CI.
3. Move to Project API on the demo side (structure + `microtvm_api_server.py`)
   implementing the Standalone Demo Project Generator based on your
   
[PoC](https://github.com/areusch/incubator-tvm/commit/b86d40a66894c08e74c952f42fd600efbe351625).

We continue discussion on the C runtime API, how to involve the AoT people ?
We can contribute to the development if necessary.

Does this work for you ?

Cheers

Arthur





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-standalone-code-generation-and-c-runtime-for-stm32-bare-metal-devices/9562/5)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/59d994c3bc18ee2140eb26fde97a258f0973f79825d2cc9e7b6d89878da45903).

Reply via email to