Hello Andrew @areusch
>In my mind, some setup function is needed to accomplish: > > 1. initializing memory set aside for tensors and parameters > 2. configuring accelerators, including starting (possibly) backgrounded > transfers of any programming/parameters. > > I think that the TVM function for this is the factory function (right now, > typically mod["default"]()), and the X-Cube equivalent is > ai_[<model_name>_]create. Does that match your understanding? This is exact. >Apologies, I think I was a bit confused before. IIUC, I think this port aims >to implement an API aligned with the X-Cube API, at least for now only aiming >to enable deployments to STM32–does that also seem right to you? I’m curious >whether this API aims to replace the C runtime and Model-based Module Runtime >Interface for all targets or if this would just be confined to STM32 for now. ;-) If I am ambitious, I would say replace for **a family of embedded targets**. Sorry, I perhaps, was not clear earlier. We have observed several embedded tools converged on such API: - [X-CUBE-AI](https://www.st.com/en/embedded-software/x-cube-ai.html), of course - [TensorFlow Lite for Microcontrollers](https://www.tensorflow.org/lite/microcontrollers) - NXP eIQ-GLOW [AOT NN Compiler](https://www.nxp.com/docs/en/user-guide/EIQGLOWAOTUG.pdf) That seems a good argument to try also aligning the TVM C API in this direction. We probably need to change the naming, perhaps have `tvm_ai_` instead of just `ai_` - this is a detail. Important point is that there is a dozen of methods common to the above APIs and that the memory management is left to the main application to handle. I propose to start with the STM32 code emitter now and work together with the TIR-based AoT on converging to a common understanding. This will pave the way for us to move to the TIR-based code generator. We can perhaps also contribute to its development. > Then the next questions I have would be around how you’d like to proceed with > this going forward. At present, the STM32 generator PR you’ve proposed has > several features that are missing from the microTVM compiler (e.g. memory > pinning, AOT, etc). As we implement these features, will it be possible to > incorporate them into this generator as well (I.e. to take advantage of > compiler-level improvements we might be able to make, such as graph-level > optimization)? This would be the plan. I can imagine a couple of things we can do with the TIR-based AoT that we cannot with our current code emitter. > If so, it would be great to keep the STM32 API semantically similar to the > TVM C runtime API, so that we can later invoke TVM C runtime APIs from the > STM32 functions. I suspect these are pretty similar, but just want to > understand the goals for code-reviewing your PR. One possible scenario is: > when we have a TVM AOT runtime and memory pinning available, we could rework > ai_create to instantiate the TVM C AOT runtime. It would also be great to use > the STM32 API as inspiration to expand the TVM APIs to provide equivalent > functionality. Please let me know your thoughts here! Corresponds entirely to our vision. Great ! > So my question here is: in the future, woudl you be open to using a TVM-side > implementation of a memory-pool, statically-allocated memory planner? I think > it sounds like that’d be okay, but just confirming. Yes. We will move away from the JSON graph and base the code emission on the TIR-based TVM structures, including the memory planner. > When we do tensor pinning, I think it’s likely I’ll propose to add some > tensor_id (note: different from storage_id, as storage_id could contain > multiple tensor_id) to TVMBackendAllocWorkspace, and a lookup table could > just return a pointer into the pre-allocated memory pool. > TVMBackendFreeWorkspace would become a no-op. Will that work for you guys? That is good. Just keep in mind that these memory pools should be open to a static allocation as a section via a link script, to a static allocation as a table from the main application (.data), and to the dynamic allocation via whatever allocator the application may choose. > - consider removing the need to use PackedFunc looked-up by string name, and > instead provide more natural C wrappers around those functions Already the case. > - consider creating a mapping from PackedFunc string name to a global symbol > name to shortcut this lookup, as they won’t likely be dynamically overridden > in embedded applications. We will add a API method for such lookup implementing the mapping. > Would it be possible to checkin a docker container e.g. tlcpack/ci-stm32 > which could run this in our CI? Then we can just make it a first-class > example and place in apps/microtvm/stm32 or a similar sub-directory of > microtvm of your choosing. Yes. Noted. The Module Library Format seems not fully finalized yet ;-) That's fine. I will generate the structure as per your RFC proposal (no crt), and we can refine it from there. This is a minor detail. ### Actions for us: Re-submit the PR with this: 1. Move to generating Module Library Format (as it is for now). 2. Provide the docker and a test application for the sanity CI. 3. Move to Project API on the demo side (structure + `microtvm_api_server.py`) implementing the Standalone Demo Project Generator based on your [PoC](https://github.com/areusch/incubator-tvm/commit/b86d40a66894c08e74c952f42fd600efbe351625). We continue discussion on the C runtime API, how to involve the AoT people ? We can contribute to the development if necessary. Does this work for you ? Cheers Arthur --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-standalone-code-generation-and-c-runtime-for-stm32-bare-metal-devices/9562/5) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/59d994c3bc18ee2140eb26fde97a258f0973f79825d2cc9e7b6d89878da45903).