Great, a few final clarifications.
> The Module Library Format seems not fully finalized yet :wink: That’s fine. I > will generate the structure as per your RFC proposal (no crt), and we can > refine it from there. This is a minor detail. It is somewhat of a living standard, but it's versioned. If you have tests for your implementation, we will run them as we make changes and bump Model Library Format version. One clarification we do need to make here: Model Library Format is generated with the function `tvm.micro.export_model_library_format`, and the generated directory tree is given as an argument in [Project API](https://discuss.tvm.apache.org/t/rfc-tvm-project-api/9449) to `generate_project`. I think you should just need to modify your codegen to consume Model Library Format rather than also making a generator for it. Sorry if that was unclear, and let me know if something seems fundamentally broken with that approach. Right now, Model Library Format includes graph executor configuration and so suggests the executor that should be used. I think you can just ignore that piece and/or use it to drive your codegen. With all this said, we just have a [PoC](https://github.com/areusch/incubator-tvm/commit/b86d40a66894c08e74c952f42fd600efbe351625) of Project API we're developing now. Currently there is just a [demo](https://github.com/areusch/incubator-tvm/commit/b86d40a66894c08e74c952f42fd600efbe351625#diff-54b391a3e85f1bac817634088c77742416fbc7c8bf551d823524f80cd577464d) of an implementation for the host C runtime. The remaining items before committing the PoC are: - Develop the Zephyr API implementation - Migrate apps/bundle_deploy to use Project API I'll try to post the Zephyr implementation as a (loose) example (e.g. the Zephyr impl would not do runtime generation nor memory pinning) of what I'm thinking for STM32 codegen by end-of-week. Let me know what you think of this approach. We could expand the content of Model Library Format, if that was necessary for an STM32 implementation. The benefit of doing this is that autotuning is going to use Project API to drive the build/flash/timing pipeline, so it would be a more natural shift as we move towards that. There is one additional detail not yet ironed out: the code you would want to generate for autotuning is very different from that you'd want to generate for inference. My vision for this was to have two different project generators (e.g. `apps/microtvm/stm32/inference` and `apps/microtvm/stm32/autotune`). In this proposal, the `inference` project would essentially be implemented as you guys have done now, and `autotune` would need to include the TVM RPC server and logic to drive the RPC transport over e.g. UART, USB, etc. Let me know what you think of this idea. > I propose to start with the STM32 code emitter now and work together with > the TIR-based AoT on converging to a common understanding. This will pave the > way for us to move to the TIR-based code generator. We can perhaps also > contribute to its development. Great, that sounds good. Let's discuss the API convergence in a follow-on RFC. I'm not sure I see exact unification on naming across frameworks, but I agree that the structure of our API is a bit divergent from the other embedded AI platforms. The API change will affect many, so we'll need to have a focused discussion and loop in quite a few others. @giuseros @ramana-arm, possible to give an update on the AOT progress? >> When we do tensor pinning, I think it’s likely I’ll propose to add some >> tensor_id (note: different from storage_id, as storage_id could contain >> multiple tensor_id) to TVMBackendAllocWorkspace, and a lookup table could >> just return a pointer into the pre-allocated memory pool. >> TVMBackendFreeWorkspace would become a no-op. Will that work for you guys? > > That is good. Just keep in mind that these memory pools should be open to a > static allocation as a section via a link script, to a static allocation as a > table from the main application (.data), and to the dynamic allocation via > whatever allocator the application may choose. Yeah this is all part of that. In particular, some accelerators may need a subset of parameters to live in a memory pool that lives at a fixed address for faster loading at startup. > > * consider removing the need to use PackedFunc looked-up by string name, > > and instead provide more natural C wrappers around those functions > > Already the case. >> - We will add a API method for such lookup implementing the mapping. Here, my goal is just to implement a simpler code-generation of `tir.call_packed` nodes which avoids a string lookup at inference time (e.g. avoids calling `TVMBackendGetFuncFromEnv` to do the string-lookup at inference time). > ### Actions for us: > > Re-submit the PR with this: > > 1. Move to generating Module Library Format (as it is for now). > 2. Provide the docker and a test application for the sanity CI. > 3. Move to Project API on the demo side (structure + > `microtvm_api_server.py`) implementing the Standalone Demo Project Generator > based on your > [PoC](https://github.com/areusch/incubator-tvm/commit/b86d40a66894c08e74c952f42fd600efbe351625). > > We continue discussion on the C runtime API, how to involve the AoT people ? > We can contribute to the development if necessary. > > Does this work for you ? Aside from (1), which I think can be generated with `tvm.micro.export_model_library_format`, that seems like a great plan to me! I've tagged the AOT implementers, hopefully they can give a status update here. -Andrew --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-standalone-code-generation-and-c-runtime-for-stm32-bare-metal-devices/9562/6) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/cc240131b829a066c6e3cf8e361f3604c1f312431626f6a68b900cc48c1166b0).