I think these are fair problems, and json is an OK solution for some particular 
backends. However, I think it is in particular important for us to think about 
the infrastructure implication in the long run. I think we want to discuss the 
solution in a case by case manner.


The JSON runtime is essentially another layer(abstraction) of graph. The 
codepath becomes

- IRModule -> compile-> runtime::Module(json-style) -> interpret -> external 
API.

As usual, introducing additional layer of abstraction always solves our 
problem, but we want to ask is that really the approach we want to take.  Right 
now there are two types of external APIs.

**External API Types**

- E0: Library functions(ArmCompute, DNNL, cudnn) that have routines into the 
libraries but not necessarily a serialization format for weight and function. 
- E1: Graph runtime that constructs the graph on the fly with a series of APIs, 
and run function.
- E2: Graph runtime-style(TF) that have a serialization format(e.g. protobuf) 
for both weight and function.

**Problems to Solve**
- P0: How to serialize the constants(weights)
- P1: How to serialize the computation(code)

**Discussion**

Our overall principles are: 
- P0: Minimize "external specific passes": make sure that the compilation stays 
in IRModule as much as possible. 
- P1: Reduce layers of abstractions as much as possible.

For E1(e.g. TF): we should definitely avoid the additional layer of 
abstraction. Because we can simply go head and use the native serialization 
format. Both P0 and P1 can naturally be solved in this case.

For E0, the best approach is to lower the sequence of libraries calls into TIR 
calling sequences after the unified IR. The argument there is that since they 
are already API functions, direct compilation opens the path for future AOT. 
See also a discussion here https://discuss.tvm.ai/t/guideline-relay-aot/5977 

The case of E1 is certainly more complicated, it requires two functions:
- Init that constructs the graph(Module) and only runs once
- Run that executes the existing library.

They could still certainly be lowered first to TIR then to calls into the C 
API, at least the code(P1) part.

The main challenge for us is how to come up with a solution for P0: weights or 
large constants, and I do think that this part deserves more careful thoughts, 
so I will discuss it in the next section.

### Weight(Constant) Serialization

My first reaction to the weight serialization is that they should be lifted 
outside tof the external codegen when possible. This way we could reuse tvm 
runtime's native mechanism to store these NDArrays and as the serialization 
mechanism improves to more variaties(static code section and binary) for 
different cases, we will be able to take benefit of all of them. We won't have 
problem for most APIs in E0.

Of course the main problem of such an approach is for cases in E1. As many of 
these types of APIs needs to "pre-compute" some intermediate representations of 
weights from existing ones. There are two  two possible mechanisms:

- M0: Couple the code serialization and meta-data constant serialization into a 
single format in a single runtime::ModuleNode.
- M1: De-couple the code serialization and meta data serialization

All of our current runtime module design are based on M0. Let me give an 
example of M1

### A Layered Approach

Here is an example of the layered approach,

```c++
// Using DSO library as an example for code serialization

static Engine cached_engine;

TVM_EXPORT_TYPED_PACKED_FUNC(__InitModule, [](Array<NDArray> metadata) {
   engine = InitEngine(&cached_engine, metadata);
});

// Alternatively, the destroy can be a C API that the DSOModule recognizes 
TVM_EXPORT_TYPED_PACKED_FUNC(__DestroyModule, []() {
   engine = DestroyEngine(&cached_engine);
});
```

```
class ModuleMetaDataWrapper : public runtime::ModuleNode {
  public:
    Init() {
       // get function from its imported modules
       PackedFunc init = 
            this->imported_modules[0]->GetFunction("__InitModule");
       // can also pass meta data in via positional sequence
       //  before runtime::Array lands
       init(metadata);
    }  
    ~ModuleInitWrapper() {
       PackedFunc destroy = 
             this->imported_modules[0]->GetFunction("__DestroyModule");
       destroy();
    }
 
    GetFunction(name) { 
       if (name != "__InitModule" && name != "__DestroyModule") {
          return this->imported_modules[0]->GetFunction(name);
       }
    }

  private:
   // meta data serialized in along ModuleInitWrapper
   // can support other meta data
   Array<NDArray> metadata;  
};
```

When generating code, we can generate a 
`ModuleMetaDataWrapper{imports={DSOModule(dnnl_code)}}`;

The main advantage of the layered approach is that we can de-couple the 
constant serialization from the code serialization  itself. We can mix and 
match the serialization mechanisms, for example, build a common constant 
serialzation format, or reuse existing ones.

Another advantage is that it opens path for AOT to completely discard the 
interpreter if necessary. While still making other runtimes possible(e.g. the 
DSOModule can still be replaced by a json one).

## Discussions 

My take to the JSON runtime is that it is a short term solution to the problem 
that we want to solve. While it is fine for a particular runtime to adopt json 
as a serializaiton format, I don't think it is a good idea to introduce another 
layer of common abstraction.So it may not be the long term solution that we are 
seeking for. 

As always would be helpful to discuss the problems in a case by case manner. In 
particular, would love to get everyone's view about the de-coupling and refine 
the ideas here. So that we can build a modular solution that works for all 
runtime cases (AOT, VM, graph) in a single API.





---
[Visit 
Topic](https://discuss.tvm.ai/t/byoc-runtime-json-runtime-for-byoc/6579/4) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/6db2bd653fc99c020637462d57c0c44e26b824be801d648fcb968dacaca4b7b6).

Reply via email to