C++ runtime] multimodel support

tqchen via Apache TVM Discuss Wed, 25 Nov 2020 06:07:16 -0800


I think we want to dissect these points a bit:

### F0: multi-model support

The support for multiple model loading is discussed and resolved as part of
module based runtime interface
https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025/47.
Although the current implementation @FrozenGene might only handle single
model case in the initial impl, should not hard to add additional support.

### F1: compress the binary blob

This should not be a hard feature to add (optionally), by adding a protocol
flag to the serialized data. It would introduce a runtime dependency on zlib
that may not always be available on embedded devices. The main thing is to keep
backward compatibility.

### F2: simplify the operator name

There can be pros and cons in this, notably the operator names are directly
tied to the function names themselves and can be useful for debugging. So it
may not be a good idea to simplify the names.

### F3: multi-threading setups

This is something that worths some more thoughts. Since depending on the
caller(want to control threading vs leave threading to tvm runtime), the
platform(embedded vs server), the best solution can be different.

Right now the graph runtime is assumed to be used in a thread local fashion, as
the local memory are pre-allocated. There can be some opportunities in sharing
the parameters(but not the activations) among the executors. The main question
on F3 is not about the possibility of optimizations, but about how to do it.

Since different ways of thread-safety model can affect user interface.
- The stateful set/get/run interface is still the most natural one under
minimum resource requirements. And likely we will like to keep it for embedded.
This will mean however, that the graph runtime itself is stateful (since set
can affect run). For multi-threaded settings, user cache an executor in TLS.
- Alternatively a predict API can be made fully stateless, but would introduce
additional dependency on Tuple for multiple output and might optionally depend
on dynamic allocation.

---
[Visit
Topic](https://discuss.tvm.apache.org/t/c-c-runtime-multimodel-support/8518/2)
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/ae5e852ad789dc0eb0c2064e61f713e7372ad883f3bc565e52fd700cfd6ed13f).

[Apache TVM Discuss] [Development/RFC] [C/C++ runtime] multimodel support

Reply via email to