I think we want to dissect these points a bit:

### F0: multi-model support

The support for multiple model loading is discussed and resolved as part of 
module based runtime interface 
https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025/47.
   Although the current implementation @FrozenGene might only handle single 
model case in the initial impl, should not hard to add additional support.

### F1: compress the binary blob

This should not be a hard feature to add (optionally), by adding a protocol 
flag to the serialized data. It would introduce a runtime dependency on zlib 
that may not always be available on embedded devices. The main thing is to keep 
backward compatibility.


### F2: simplify the operator name

There can be pros and cons in this, notably the operator names are directly 
tied to the function names themselves and can be useful for debugging. So it 
may not be a good idea to simplify the names.

### F3: multi-threading setups

This is something that worths some more thoughts. Since depending on the 
caller(want to control threading vs leave threading to tvm runtime), the 
platform(embedded vs server), the best solution can be different.

Right now the graph runtime is assumed to be used in a thread local fashion, as 
the local memory are pre-allocated. There can be some opportunities in sharing 
the parameters(but not the activations) among the executors. The main question 
on F3 is not about the possibility of optimizations, but about how to do it. 

Since different ways of thread-safety model can affect user interface.
- The stateful set/get/run interface is still the most natural one under 
minimum resource requirements. And likely we will like to keep it for embedded. 
This will mean however, that the graph runtime itself is stateful (since set 
can affect run). For multi-threaded settings, user cache an executor in TLS.
- Alternatively a predict API can be made fully stateless, but would introduce 
additional dependency on Tuple for multiple output and might optionally depend 
on dynamic allocation.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/c-c-runtime-multimodel-support/8518/2) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/ae5e852ad789dc0eb0c2064e61f713e7372ad883f3bc565e52fd700cfd6ed13f).

Reply via email to