@Mousius @tqchen @stoa

Great discussions here and apologies for my delay in reply! I'm just adding 
some thoughts here as context in reviewing PR 8023 and since I kind of left 
this thread hanging.

> Thinking about how we expose this to the user, here is a proposed behaviour:
>
>* Default behaviour of AOT should be A0+C0 to be compatible with the rest of 
>the ecosystem
>* With an optional parameter `--no-typed-operators` TVM would instead produce 
>A1+C1 internal operators but leave the A0+C0 entrypoint

I agree with the A1 + C1 idea for internal functions and a way to provide a C0 
interface when an internal function is called externally. It might be nice to 
(eventually) have a way to generate this interface for arbitrary 
functions--this way, AOT (or operator functions) could be called from the TVM 
RPC server.

Adding `--no-typed-operators` makes sense to me, but would propose to change 
the name. `--no-typed-operators` reads pretty generically to me and could imply 
something like "operator + is aware of the types of its arguments." but that's 
always true. 
i'd suggest two modifications:
1. given we are abusing Target to hold runtime-specific information, let's 
choose a name for which the default 0 value preserves the existing behavior. 
`--typed-operators` defaults `true` here, but we can't document that properly 
(in `target_kind.cc`) since we are abusing Target.
2. let's choose a more specific name, such as 
`--dltensor-only-function-signatures`. or alternatively, something that 
references "call_unpacked," maybe `--unpacked-api`? i don't know that 
`--typed-signatures` fully encapsulates the effect of the flag, since it not 
only removes `type_code` but also assumes there is a `DLTensor` for each 
argument.

>* With an optional parameter `--micro-entrypoint` TVM (in AOT mode) would 
>switch to producing a A1+C2 entrypoint at the top level. This has no effect on 
>Graph execution as it doesn’t have such an entrypoint.

I think it'd be interesting to generalize `--micro-entrypoint` to generating C0 
wrappers for functions. Then, you could use AOT with host-driven execution, 
which could be handy for prototyping. 

To add some future-looking thoughts, which don't concern the immediate 
implementation here: I am curious whether or not we may eventually be able to 
implement both the embedded deployment API and a version of the [Module-based 
Model Runtime 
Interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025)
 in TIR as was proposed in the original [Relay 
AOT](https://discuss.tvm.apache.org/t/guideline-relay-aot/5977) guideline post. 
Doing this would remove wrapper code and consolidate the implementation into 
generated code. Then we could just generate the correct (e.g. C0 or 
C1/C1-typed-wrapper) API depending on the runtime scenario (e.g. RPC-driven or 
standalone deployment).

Otherwise, I would think for prototyping it would be interesting explore a 
generic PackedCFunc wrapper generator to enable use of the AOT and operator 
functions from µTVM RPC server. If we go this route, my proposal would be to 
design the embedded-facing deployment API to enable application-defined memory 
management (e.g. following @stoa proposal), and then build a wrapper around 
this API to the PackedFunc-based Model Runtime interface (potentially tweaking 
as necessary). In practice, I suspect this wouldn't be a significant departure 
from the embedded deployment API--the main changes we would need to make are 
around initialization, where application-specified memory needs to be used 
rather than dynamically-allocated memory. However, this may mean that there are 
two "consumers" of the AOT TIR output.

I'll raise a follow-on RFC around these points as we get closer to landing the 
deployment API. Here, I mainly wanted to provide some thoughts on future 
directions that could affect `--micro-entrypoint` towards @Mousius comment: 
> It’s worth noting I think it’s worth getting this right at the user level so 
> we don’t need to change it much when arguments get refactored.

We could consider broadening `--micro-entrypoint` to `--micro-api` or some 
equivalent should we go this route.

### Dynamic shapes
> That in mind, should there be sufficient space for them to run in this way, 
> they can always turn the option off and incur the penalty for the dynamic 
> behaviour?

It seems like there are cases where there is some constrained variability in 
model input and output shapes. I agree we should prioritized fixed model sizes, 
but it would be great to ensure that the API could handle some limited dynamism 
if needed. We don't need to address this in the initial PR, since we're 
wrapping everything now, but it would be great to consider in the Runtime 
Interface RFC.

### Header Dependencies

> I’m considering it more in terms of dependencies you have to move across to 
> include in your project, particularly in terms of raw files; `standalone_crt` 
> contains a lot of files that I’m unlikely to need in a deployment, I wouldn’t 
> expect to integrate all of them into an embedded system, particularly if I 
> have strict requirements for code being included.

My opinion is that we should just be explicit about what you need to include 
and make it easy to do that (e.g. place only those include files in some dir 
with appropriate prefix). I think it's always possible to strip code down, but 
we should make it simple to get started, particularly as users may revise 
models frequently. The `standalone_crt` bundle is a start in that direction--we 
may need to further split it as we introduce the embedded C runtime interface. 
I don't think it's a big deal to define extra unused structs, types, or static 
functions, and would not consider that a sufficient condition to create two 
sets of header files. I think the complexity to the end user outweighs the 
benefit of simpler code.

### Embedded C runtime interface

I'll follow up on that RFC with discussion about the API itself.

### Moving forward

Just to state my idea of where I think things are headed here, to remove any 
confusion about my late reply:
1. Let's work to merge PR 8023 (I may propose some name changes to those Target 
flags)
2. Let's continue to discuss on [Embedded C Runtime 
Interface](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951/5)
 and arrive at an acceptable deployment interface.
3. When these two building blocks are landed, we can consider future directions 
around host-driven AOT, dynamic shapes, and whether we should implement the 
user-facing AOT interfaces in TIR.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-utvm-aot-optimisations-for-embedded-targets/9849/11)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/c5cf17215b9bbc480e4b9729451044b8d36c2734b48cd1da180b5c206c60c438).

Reply via email to