# Model Library Format

## Background

TVM's build process for imported models centers around `tvm.relay.build`, a 
function which produces a 3-tuple `(graph_json, lib, params)`. The inference 
workflow then diverges depending on how the user wants to use the compiled 
artifacts:

- If the build targets the c++ runtime and uses the `llvm` backend...
    - and the user wants to run in the same Python instance used to compile: 
the user can directly instantiate a GraphRuntime instance.
    - and the user wants to transfer the model to another Python runtime 
instance without cross-compiling: the user can call `lib.export_library()`, and 
store `graph_json` and `params` in some ad-hoc way. Then, 
`tvm.runtime.load_module()` can recreate `lib` in the new runtime instance.
    - and the user wants to transfer the model to another Python runtime 
instance with cross-compiling: the same procedure as above, but pass `fcompile` 
to `export_library` to specify the cross-compiler.
- If the build targets the c++ runtime and uses the `c` backend...
    - and the user wants to run the model with Python on similar architecture: 
the user must compile the produced `c` files to produce an artifact similar to 
the one produced by `lib.export_library()`.  Then, they can load and run the 
library following the procedure above. When saving and loading from the same 
instance (so `graph_json` and `params` are not a consideration), this process 
is handled invisibly by `loadfile_tar`.
    - and the user wants to run the model with Python on different 
architecture: same procedure as above, but with a cross-compiler.
    - and the user wants to run the model with a different frontend language: 
same procedure as above, but the user must translate `graph_json` and `params` 
to a format suitable for the other language
- If the build targets the c runtime...
    - and the user wants to run the model with TVM in Python: not supported — 
Python supports C++ runtime only.
    - and the user wants to run standalone: compile with `-system-lib`, store 
the library in a `.tar` with `export_library()`, store `params` and 
`graph_json` to disk in an ad-hoc way, unpack the tar and integrate all pieces 
into a standalone project. A small `main` is needed to launch the C runtime, 
load the model and parameters, and run inference. See `apps/bundle_deploy`.

In all cases *except* the first (compile and run in the same TVM instance), the 
user needs to serialize the `tvm.relay.build` 3-tuple before doing anything 
else. However, TVM provides no common function to handle this—it only directly 
handles serializing the compiled library. The user is left to store the 
parameters and runtime configuration (e.g. `graph_json`) in a way that suits 
the task at hand. This discrepancy means that all the automation that consumes 
TVM artifacts from disk is always hand-written and specific to the situation.

On microTVM, we are preparing to introduce a Project-level API, implementations 
of which a) live in separate codebases from `tvm` and b) build firmware images 
from the `tvm.relay.build` artifacts. Due to this burden, the API needs to 
specify how all artifacts from `tvm.relay.build` are placed on-disk. 

To prepare for this API, we propose Model Library Format, a standard on-disk 
format for microTVM artifacts. microTVM primarily expects users to use the `c` 
or `llvm` backends with a cross-compiler, and build results may contain BYOC 
artifacts as well. As a secondary goal to this RFC, we make some considerations 
such that Model Library Format could be re-used as the standard on-disk format 
produced by `tvmc`.

## Goals

- Describe a standard way to serialize microTVM artifacts for use in downstream 
automation to compile them into firmware
- Describe how to implement a load API such as `tvm.runtime.load_module() -> 
GraphRuntimeFactory`.
- Make considerations to accommodate other runtimes such as AOT and VM.

## Non-Goals

- Immediately change the `tvmc` output format to Model Library Format for 
non-µTVM uses. The initial implementation is focused to microTVM only.
- Decide how to serialize compilation flows unrelated to microTVM

## Model Library Format

Model Library Format is a tar-archived directory tree. A sketch is as follows:

```bash
/
 README.md - A short standardized README for new users plus human-readable 
metadata.json
 metadata-<n>.json - Overall metadata describing this artifact; version <n>
 crt/ - The content of standalone_crt from TVM build/
  Makefile
  include/
   ...
  src/
   ...
 codegen/ - Stores generated libraries in source or binary form
  host/ - Generated code for target_host
   lib/ - Generated binary object files
    aot.o - Future home of AOT runtime generated code
    devc.o - C++ MetadataModule artifact, unused in µTVM. Should get deleted.
    lib0.o - LLVM module
    lib1.o - LLVM CRT Metadata Module
   src/ - Generated C source
    devc.c - C++ MetadataModule artifact, unused in µTVM. Should get deleted.
    lib0.c - C module
    lib1.c - C CRT Metadata module
  target_key/ - Additional directories for code which should get compiled for 
use on a target.
 parameters/ - Stores simplified parameters
  <model_name>.bson - BSON-serialized runtime parameters (optional)
  <model_name>.params - tvm.relay._save_params format (always present)
  <model_name>.json - JSON-serialized parameters (optional)
 relay.txt - text representation of the relay model compiled, if built from 
Relay
 runtime-config/ - Stores runtime configuration.
  aot/ - AOT runtime config
   (tbd)
  graph/ - Graph runtime config 
   graph.json - Graph runtime JSON
```

### metadata.json

The metadata file contains machine-parseable data describing the build. It also 
contains model-level information that is easier (right now) to parse as a 
single JSON document rather than split into many smaller purpose-specific files.

Following is a proposed schema:

```bash
{
    "version": 1,  // version of this document.
    "model_name": "<model_name>",  // model name, (passed as mod_name= to 
tvm.relay.build).
    "export_datetime_utc": "%Y-%m-%d %H:%M:%S"  // Time of export, in UTC.
    "memory": {},  // configured memory map (see Memory Map)
    "target": "",  // TVM target string used to compile this artifact
    "runtimes": ["graph"],  // The runtimes that can launch this model.
}
```

### Memory Map

In v1, the Memory Map will describe the buffers allocated by the GraphRuntime. 
As the memory planner is improved, this data structure will be expanded. 
Following is the schema for the "memory" key in v1:

```bash
[
    {
        "storage_id": <n>,  // storage_id of the buffer, allocated by 
GraphRuntime
        "size_bytes": <n>,  // size of this buffer, in bytes
        "input_binding": ""  // when bound to a model input, the name of that 
input
    },
    // Additional entries
]
```

## Building a Model Library Format

Here is the process by which TVM creates a Model Library Format from 
`[tvm.relay.build](http://tvm.relay.build)` artifact. Here, `graph_json`, 
`lib`, and `params` are the 3-tuple returned and `target` is the TVM target. 
mkdir is assumed.

1. If `target` contains `--runtime=crt`, copy `$tvm_root/build/standalone_crt` 
to `./crt`.
2. Populate `./codegen` by calling `lib.export_library()`, which should:
    1. Collect all Modules that execute on the host and pass to `fcompile`. At 
present, these are those with `type_key()` of `c` or `llvm`. When the `c` 
target is used, `fcompile` should copy the generated files into 
`./codegen/host/src` instead of generating a `.tar`.
    2. (TODO, but not as a result of this RFC) Group the non-host modules by 
target_type (except that ext_dev target_types should be expanded to a unique 
key per BYOC). Save each generated module into a file underneath 
`./codegen/<target_type>`. 
3. Populate `./parameters` .
    - Produce `<model_name>.params` with `tvm.relay._save_params`.
    - Produce `<model_name>.json` with TBD (there doesn't seem to be a standard 
in TVM, so I guess we'll have to propose one)
4. Produce `relay.txt` with `IRModule.get_source`
5. Produce `./runtime-config` as follows:
    - for GraphRuntime: save `graph.json` to `./runtime-config/graph/graph.json`
    - for VM: TBD
    - for AOT: TBD
6. Produce `metadata-<n>.json` by building the required data structure and 
serializing to JSON.

Finally, the entire directory tree should be packaged into a TAR file with 
`.model-lib` extension for easy transmission.

## Implementation in TVM

The implementation of this RFC will initially consist of the following:

1. Adding a new function, `tvm.runtime.Module#export_model_library_format`. 
This function implements the above procedure for runtimes which use the `c` 
backend.
2. Placing the state necessary to implement `export_model_library_format` into 
GraphRuntimeCodegenModule, and making it accessible from Python.
3. Adding `loadfile_model_lib` which allows loading 
`tvm.runtime.GraphRuntimeFactoryModule` from the file produced by 
`export_model_library_format`.
4. Adding unit tests and changing apps/bundle_deploy to use this format as an 
example.

Following implementation of this RFC, another RFC (Project-level API for µTVM 
projects) will be submitted explaining how we intend to refactor the current 
interaction between TVM and µTVM runtime projects to allow for better 
portability. Also, `tvmc` will begin creating Model Library Format for 
`--runtime=c` targets.

## µTVM Use Cases

Here I briefly walk through some µTVM use cases of Model Library Format to 
consider whether it's a net improvement.

### Building Host-Driven Firmware (µTVM)

At present, µTVM builds host-driven firmware (GraphRuntime instantiated on the 
host) as follows:

1. The user instantiates an implementation of `tvm.micro.Compiler`.
2. TVM invokes `tvm.micro.Compiler#library` to compile each CRT sub-library and 
the code in `./codegen/host`.
3. TVM invokes `tvm.micro.Compiler#binary` to build a binary firmware image 
including each library.

Following implementation of this change, the compilation flow will remain the 
same, but the CRT sources used will be taken from the Model Library Format tree.

### Host-Driven Inference

At present, this is done from within the same Python script as called 
`[tvm.relay.build](http://tvm.relay.build)` since it's easier to keep all of 
the state in memory. This can be done with a separate `python` invocation, but 
there is no standard function to load all of the state necessary, so it's 
ad-hoc. Following this change, the GraphRuntimeFactoryModule can be loaded 
using `tvm.runtime.load_module`, so it will be much easier to reconstruct the 
state needed for host-driven inference.

### Building Standalone Firmware (e.g. `apps/bundle_deploy`)

Currently, `apps/bundle_deploy` invokes a custom Python script which produces 
artifacts in `apps/bundle_deploy/build`. After this RFC, 
`apps/bundle_deploy/build_model.py` will produce Model Library Format artifacts 
for the C-runtime compatible artifacts. 

For `apps/bundle_deploy`, the Makefile will be updated to reference the 
artifacts in standard locations. In the future, it will be possible to write a 
standard script to ingest generated code as a library into project build 
systems.

## Future Work

We expect to make changes as future considerations are made in Model Library 
Format. Each time a change is made, the version number will be incremented. 
Here are some sketches of future topics that could be tackled.

### Contexts

In heterogeneous execution, this object will describe the various DLContexts 
that TVM expects to be configured on the device. This RFC doesn't seek to fully 
describe this key—heterogeneous execution is a future goal of µTVM, and until 
something more concrete is proposed there, this key will just contain an entry 
for `DLContext(kDLCPU, 0)`.

Here is a strawman:

```bash
    "contexts": [
        {
              "device_type": "cpu",
              "device_id": 0,
        },
        {
              "device_type": "ext_dev",
              "device_id": 0,
              "compiler": "accel_compiler_key",
              "config": {
                    // device-specific config, populated by BYOC
              },
        },
    ],  // configured DLContext (see DLContext configuration)
```

### Models Targeted to the C++ Runtime

Models targeted to the C++ runtime have very similar structure to those 
targeted at the C runtime. The main difference is in how non-`c` and non-`llvm` 
("non-DSO-Exportable") modules are packaged.

The C++-runtime places all modules in a single shared library like a "fat 
binary." At load time, it expects to find a constant `__tvm_dev_mblob` which 
contains concatenated `Module#save` from all of these modules. It then invokes 
a `runtime.module.loadbinary_<type_key>` for each Module in `__tvm_dev_mblob`.

In the C runtime, non-DSO-Exportable modules are typically created from BYOC 
flows and are meant to be executed by accelerators. Because RAM is typically 
quite precious on µC, the C runtime intends to make such generated BYOC code 
available to the downstream firmware build at compile time. Modules are grouped 
by `target_type` one file is generated per Module containing  `Module#save` .

It's possible that both approaches could be taken for C++ runtime to allow 
pre-compilation of Modules. However, the simplest and most likely way to move 
forward would be to create `./codegen/<model_name>.so` and avoid creating 
subdirectories. When the `c` backend is used with the C++ runtime, 
`./codegen/host/src` could still be created, or the `.tar` could be placed in 
`./codegen/<model_name.tar>`.

@tqchen @gromero @leandron @manupa-arm @mdw-octoml @jroesch @mjs @liangfu





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-tvm-model-library-format/9121/1) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/9b28c1eacbbbfb80ca1d98bb194e3ba122566233b3e5a6e0595da53357363f67).

Reply via email to