hi @stoa, 

Thanks for the elaborate RFC here! You bring up a bunch of great points. 

This is a really strong proposal and I think overall fairly well aligned with 
the direction I want to take microTVM. Particularly since similar code has been 
posted to the forum before, it would be great to have a discussion around the 
implementation details here.

For the purposes of discussion, let's break this proposal apart into pieces:

P1. Code Emitter (e.g. Executor implementation or GraphRuntime replacement)

P2. Tensor memory allocation 

P3. The firmware-facing API

Finally, I'd like to discuss ways to reduce code duplication and avoid 
splintering the overall design of µTVM. In particular, it seems like this could 
become a Project API implementation. I'll leave some thoughts below on each 
piece.

### Code Emitter

This approach is similar to some others posted to the forum before:
- [µTVM Static Code 
Generator](https://discuss.tvm.apache.org/t/tvm-static-runtime-code-generator/8986)
 by @r.stahl
- [my hack to do 
this](https://github.com/areusch/incubator-tvm/tree/aot-experiment)

In general, I think the direct-to-C++ route (as compared with the TIR route) is 
simple and easy to hack on, but the TIR route lends us more avenues for 
graph-level optimization. However, I don't think that the accessibility should 
be understated--tvm has a pretty steep learning curve. I think the challenge 
with checking this code into the TVM repository is testing and maintenance, as 
I'll discuss later.

### Tensor Memory Allocation

This looks very similar to what I'd propose we implement in the TIR-based 
GraphPlanMemory pass. A couple of thoughts:

- Does your approach handle workspace memory, allocated inside kernels (e.g. 
TVMBackendAllocWorkspace)?
- Could you say more about " it may be necessary that two models share their 
‘*activation* ’ pools?" Are these separate instances of the same model or two 
different models?

### Firmware-facing API

TVM does have a standard object-oriented [Module-based Model Runtime 
Interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025)
 RFC. This one is based around our PackedFunc concept, heavily used in the C++ 
runtime as a lanugage-agnostic abstraction. In firmware we certainly don't need 
such an abstraction. Somewhat related, [issue 
7596](https://github.com/apache/tvm/issues/7596) is considering how to 
implement PackedFunc calls in the C backend.

Next, I agree that the C runtime API isn't very friendly for firmware 
developers. There are a couple pieces here:

1. PackedFunc are looked-up by string name. This is inefficient in terms of 
both memory and runtime. I think we still need to maintain that string lookup 
to keep compatibility with the RPC server implementation which drives 
autotuning. However, I wonder if we might consider making it a convention to 
implement PackedFunc with particular symbol names so that they could be called 
directly in production without string lookup.
2. Arguments and return values need to be wrapped in TVMValue. I don't think we 
can get around this one, but we could implement wrappers to the firmware-facing 
executor functions to simplify this.

I wonder if there are other differences or critiques you could find of the C 
runtime that would improve it? It would be great to at least standardize the 
runtime between these two implementations. This would be in a follow-on RFC, 
though.

### Code Emitter vs TIR-based approach

Given that a number of features implemented in this RFC are on the µTVM roadmap 
(but intended to be implemented at the TIR level), I think the main difference 
in the long run here is that this RFC directly generates C++ code rather than 
passing TIR to the `c` backend. I think there are merits to both this approach 
and the TIR-based AOT being implemented by @giuseros. 

As discussed in Code Emitter section, I do think that the TIR-based approach 
gives us more future avenues to develop µTVM. However, I don't want to ignore 
how accessible approaches like these are. 

Relative to `main` right now, this RFC has a bunch of things that we don't 
have: AOT, memory pinning, API changes. It seems like we could allow an 
implementation like this to coexist as a Project API with roughly these steps:
1. Rework the PoC to consume Model Library Format and implement the Project 
API. Regarding the question of whether this should be applicable to autotuning 
or also to deployment: my thought was that this would be decided by the project 
API implementation (either create an option or a separate implementation for 
each scenario).
2. When available--use the TIR-based comprehensive memory planner (it seems 
nearly identical to the one you've implemented, and would generate JSON 
describing the memory pools).
3. Ensure at least the TVMBackend* functions are used from the C runtime, which 
provides a pathway to migrate to the TIR-based memory planner and avoids 
diverging too far in terms of generated code.

Finally, I'd also propose we consider simplifying the C runtime API as 
discussed in Firmware-facing API section.

### Testing and Code Location

Could you speak a bit more to how this code could be tested in the TVM CI? 
That's my chief concern with checking it in as a Project API implementation. I 
posted some 
[thoughts](https://discuss.tvm.apache.org/t/rfc-tvm-project-api/9449/4) about 
the bar to checking in Project API implementations to the tvm repo.

Some discussion points:

D1. Between this approach and a TIR-based AOT, do you guys have a preference 
which you would prefer to work with, assuming both were implemented?

D2. While the Python APIs are perfectly fine, one goal of Model Library Format 
is to enable downstream tools such as this to work with TVM with less API 
drift. Do you guys prefer the Python API, or would this also be an interface 
you'd be open to consuming?

D3. In general, the challenge with checking code such as this into the TVM repo 
is testing. Particularly with bare-metal code, it's hard to test without 
hardware in the loop, and the TVM CI doesn't really have a provision for that 
now. Do you guys have a proposal how we might test this code?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-standalone-code-generation-and-c-runtime-for-stm32-bare-metal-devices/9562/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/9889413f0aa1698e8e6bd3b7fe189c1bcc8c6ddb35e3330dc81d18b52e3207cf).

Reply via email to