I think these are fair problems, and json is an OK solution for some particular backends. However, I think it is in particular important for us to think about the infrastructure implication in the long run. I think we want to discuss the solution in a case by case manner.
The JSON runtime is essentially another layer(abstraction) of graph. The codepath becomes - IRModule -> compile-> runtime::Module(json-style) -> interpret -> external API. As usual, introducing additional layer of abstraction always solves our problem, but we want to ask is that really the approach we want to take. Right now there are two types of external APIs. **External API Types** - E0: Library functions(ArmCompute, DNNL, cudnn) that have routines into the libraries but not necessarily a serialization format for weight and function. - E1: Graph runtime that constructs the graph on the fly with a series of APIs, and run function. - E2: Graph runtime-style(TF) that have a serialization format(e.g. protobuf) for both weight and function. **Problems to Solve** - P0: How to serialize the constants(weights) - P1: How to serialize the computation(code) **Discussion** Our overall principles are: - P0: Minimize "external specific passes": make sure that the compilation stays in IRModule as much as possible. - P1: Reduce layers of abstractions as much as possible. For E1(e.g. TF): we should definitely avoid the additional layer of abstraction. Because we can simply go head and use the native serialization format. Both P0 and P1 can naturally be solved in this case. For E0, the best approach is to lower the sequence of libraries calls into TIR calling sequences after the unified IR. The argument there is that since they are already API functions, direct compilation opens the path for future AOT. See also a discussion here https://discuss.tvm.ai/t/guideline-relay-aot/5977 The case of E1 is certainly more complicated, it requires two functions: - Init that constructs the graph(Module) and only runs once - Run that executes the existing library. They could still certainly be lowered first to TIR then to calls into the C API, at least the code(P1) part. The main challenge for us is how to come up with a solution for P0: weights or large constants, and I do think that this part deserves more careful thoughts, so I will discuss it in the next section. ### Weight(Constant) Serialization My first reaction to the weight serialization is that they should be lifted outside tof the external codegen when possible. This way we could reuse tvm runtime's native mechanism to store these NDArrays and as the serialization mechanism improves to more variaties(static code section and binary) for different cases, we will be able to take benefit of all of them. We won't have problem for most APIs in E0. Of course the main problem of such an approach is for cases in E1. As many of these types of APIs needs to "pre-compute" some intermediate representations of weights from existing ones. There are two two possible mechanisms: - M0: Couple the code serialization and meta-data constant serialization into a single format in a single runtime::ModuleNode. - M1: De-couple the code serialization and meta data serialization All of our current runtime module design are based on M0. Let me give an example of M1 ### A Layered Approach Here is an example of the layered approach, ```c++ // Using DSO library as an example for code serialization static Engine cached_engine; TVM_EXPORT_TYPED_PACKED_FUNC(__InitModule, [](Array<NDArray> metadata) { engine = InitEngine(&cached_engine, metadata); }); // Alternatively, the destroy can be a C API that the DSOModule recognizes TVM_EXPORT_TYPED_PACKED_FUNC(__DestroyModule, []() { engine = DestroyEngine(&cached_engine); }); ``` ``` class ModuleMetaDataWrapper : public runtime::ModuleNode { public: Init() { // get function from its imported modules PackedFunc init = this->imported_modules[0]->GetFunction("__InitModule"); // can also pass meta data in via positional sequence // before runtime::Array lands init(metadata); } ~ModuleInitWrapper() { PackedFunc destroy = this->imported_modules[0]->GetFunction("__DestroyModule"); destroy(); } GetFunction(name) { if (name != "__InitModule" && name != "__DestroyModule") { return this->imported_modules[0]->GetFunction(name); } } private: // meta data serialized in along ModuleInitWrapper // can support other meta data Array<NDArray> metadata; }; ``` When generating code, we can generate a `ModuleMetaDataWrapper{imports={DSOModule(dnnl_code)}}`; The main advantage of the layered approach is that we can de-couple the constant serialization from the code serialization itself. We can mix and match the serialization mechanisms, for example, build a common constant serialzation format, or reuse existing ones. Another advantage is that it opens path for AOT to completely discard the interpreter if necessary. While still making other runtimes possible(e.g. the DSOModule can still be replaced by a json one). ## Discussions My take to the JSON runtime is that it is a short term solution to the problem that we want to solve. While it is fine for a particular runtime to adopt json as a serializaiton format, I don't think it is a good idea to introduce another layer of common abstraction.So it may not be the long term solution that we are seeking for. As always would be helpful to discuss the problems in a case by case manner. In particular, would love to get everyone's view about the de-coupling and refine the ideas here. So that we can build a modular solution that works for all runtime cases (AOT, VM, graph) in a single API. --- [Visit Topic](https://discuss.tvm.ai/t/byoc-runtime-json-runtime-for-byoc/6579/4) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/6db2bd653fc99c020637462d57c0c44e26b824be801d648fcb968dacaca4b7b6).