We have currently built the infra for Bring-Your-Own-Codegen. For demonstration 
purpose, a simple CSourceModule style codegen and runtime is used for ccompiler 
and dnnl (now called oneDNN). CSourceModule runtime works reasonably well on 
small examples and it is easy to understand. However, it also poses quite a few 
challenges on development and deployment of relatively large models or models 
with relatively large inputs.
- The serialization is quite cumbersome as it normally works on per operator 
and emits a wrapper to invoke the library.
- Handling last constants is difficult. We currently either have to introduce 
countless assignments or allocate a large chunk of memory on the static 
segment. These approaches may significantly increase the compilation time.
- For certain backends, like TRT and dnnl, CSourceModule complicates the use of 
or even makes it impossible to use their execution engine.

This RFC proposes a JSON runtime associated with a JSON serializer for BYOC 
which effectively solves the above problems. In addition, this type of runtime 
is more familiar to the community as the graph runtime is more or less in this 
style and we have already implemented a minimal example  runtime. This RFC 
extends the minimal example and makes it more general to all backends with 
execution engine.

- JSON nodes and code generator/serializer
        - Data structures to represent the nodes and entries in a json runtime. 
The serializer converts a Relay program into JSON format.
        ```c++
        class JSONGraphNodeEntry {};
        class JSONGraphNode {};
          SOE // Serialize a Relay program into JSON frormat, graph and params
        // should be saved in the same artifact
        class JSONSerializer : public ExprVisitor {};
- JSONRuntimeDriver
        - Deserialize the artifact and manage the initialization and invocation 
of the runtime.
        - Cache the engine when loading the library
    ```c++
    JSONRuntimeDriver : public ModuleNode {
    void Deserialize(); // Deserialize the artifact and engines
    PackedFunc GetFunctioin(); // Invoke a subgraph using symbol
    static Module LoadFromBinary(); // Load the JSON binary
    void SaveToBinary(); // Save the module
  ```
        
- JSONRuntimeBase
        - The base for handling a  graph. It will be extended by the concrete 
backends, like TRT, dnnl, and other accelerators.
    ```c++
    class JSONRuntimeBase : public ModuleNode {
      virtual void Run() = 0; // Invoke an engine
      virtual void Init() = 0; // Build an engine
      // Utilities to save and load a json graph.
    };
        ```
  
- Open questions
        - Symbolic representation of op attribute, i.e. `Expr start` and `Expr 
end` in the `arange` op. Normally, we should not offload this type of nodes to 
accelerators, but how can we serialize them if we want to support as some of 
them may not be data-dependent?
        - It's intuitive for BYOC to be used along with uTVM. How this JSON 
runtime will be  connected with other runtimes like utvm?

@tqchen @thierry @matt-arm @masahi @comaniac @manupa-arm @jonso @ramana-arm





---
[Visit 
Topic](https://discuss.tvm.ai/t/byoc-runtime-json-runtime-for-byoc/6579/1) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/ca373fe6d23cd9ce1e0e52e7af83e0da8ecb3735cb0113f818b2be05c1e6e37d).

Reply via email to