hi @stoa,
Thanks for the elaborate RFC here! You bring up a bunch of great points. This is a really strong proposal and I think overall fairly well aligned with the direction I want to take microTVM. Particularly since similar code has been posted to the forum before, it would be great to have a discussion around the implementation details here. For the purposes of discussion, let's break this proposal apart into pieces: P1. Code Emitter (e.g. Executor implementation or GraphRuntime replacement) P2. Tensor memory allocation P3. The firmware-facing API Finally, I'd like to discuss ways to reduce code duplication and avoid splintering the overall design of µTVM. In particular, it seems like this could become a Project API implementation. I'll leave some thoughts below on each piece. ### Code Emitter This approach is similar to some others posted to the forum before: - [µTVM Static Code Generator](https://discuss.tvm.apache.org/t/tvm-static-runtime-code-generator/8986) by @r.stahl - [my hack to do this](https://github.com/areusch/incubator-tvm/tree/aot-experiment) In general, I think the direct-to-C++ route (as compared with the TIR route) is simple and easy to hack on, but the TIR route lends us more avenues for graph-level optimization. However, I don't think that the accessibility should be understated--tvm has a pretty steep learning curve. I think the challenge with checking this code into the TVM repository is testing and maintenance, as I'll discuss later. ### Tensor Memory Allocation This looks very similar to what I'd propose we implement in the TIR-based GraphPlanMemory pass. A couple of thoughts: - Does your approach handle workspace memory, allocated inside kernels (e.g. TVMBackendAllocWorkspace)? - Could you say more about " it may be necessary that two models share their ‘*activation* ’ pools?" Are these separate instances of the same model or two different models? ### Firmware-facing API TVM does have a standard object-oriented [Module-based Model Runtime Interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) RFC. This one is based around our PackedFunc concept, heavily used in the C++ runtime as a lanugage-agnostic abstraction. In firmware we certainly don't need such an abstraction. Somewhat related, [issue 7596](https://github.com/apache/tvm/issues/7596) is considering how to implement PackedFunc calls in the C backend. Next, I agree that the C runtime API isn't very friendly for firmware developers. There are a couple pieces here: 1. PackedFunc are looked-up by string name. This is inefficient in terms of both memory and runtime. I think we still need to maintain that string lookup to keep compatibility with the RPC server implementation which drives autotuning. However, I wonder if we might consider making it a convention to implement PackedFunc with particular symbol names so that they could be called directly in production without string lookup. 2. Arguments and return values need to be wrapped in TVMValue. I don't think we can get around this one, but we could implement wrappers to the firmware-facing executor functions to simplify this. I wonder if there are other differences or critiques you could find of the C runtime that would improve it? It would be great to at least standardize the runtime between these two implementations. This would be in a follow-on RFC, though. ### Code Emitter vs TIR-based approach Given that a number of features implemented in this RFC are on the µTVM roadmap (but intended to be implemented at the TIR level), I think the main difference in the long run here is that this RFC directly generates C++ code rather than passing TIR to the `c` backend. I think there are merits to both this approach and the TIR-based AOT being implemented by @giuseros. As discussed in Code Emitter section, I do think that the TIR-based approach gives us more future avenues to develop µTVM. However, I don't want to ignore how accessible approaches like these are. Relative to `main` right now, this RFC has a bunch of things that we don't have: AOT, memory pinning, API changes. It seems like we could allow an implementation like this to coexist as a Project API with roughly these steps: 1. Rework the PoC to consume Model Library Format and implement the Project API. Regarding the question of whether this should be applicable to autotuning or also to deployment: my thought was that this would be decided by the project API implementation (either create an option or a separate implementation for each scenario). 2. When available--use the TIR-based comprehensive memory planner (it seems nearly identical to the one you've implemented, and would generate JSON describing the memory pools). 3. Ensure at least the TVMBackend* functions are used from the C runtime, which provides a pathway to migrate to the TIR-based memory planner and avoids diverging too far in terms of generated code. Finally, I'd also propose we consider simplifying the C runtime API as discussed in Firmware-facing API section. ### Testing and Code Location Could you speak a bit more to how this code could be tested in the TVM CI? That's my chief concern with checking it in as a Project API implementation. I posted some [thoughts](https://discuss.tvm.apache.org/t/rfc-tvm-project-api/9449/4) about the bar to checking in Project API implementations to the tvm repo. Some discussion points: D1. Between this approach and a TIR-based AOT, do you guys have a preference which you would prefer to work with, assuming both were implemented? D2. While the Python APIs are perfectly fine, one goal of Model Library Format is to enable downstream tools such as this to work with TVM with less API drift. Do you guys prefer the Python API, or would this also be an interface you'd be open to consuming? D3. In general, the challenge with checking code such as this into the TVM repo is testing. Particularly with bare-metal code, it's hard to test without hardware in the loop, and the TVM CI doesn't really have a provision for that now. Do you guys have a proposal how we might test this code? --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-standalone-code-generation-and-c-runtime-for-stm32-bare-metal-devices/9562/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/9889413f0aa1698e8e6bd3b7fe189c1bcc8c6ddb35e3330dc81d18b52e3207cf).