[Apache TVM Discuss] [Development] [RFC] Compute graph pipeline with new subgraph executor

huajsj via Apache TVM Discuss Mon, 26 Apr 2021 11:23:22 -0700


>This is a follow up RFC for 
>https://discuss.tvm.apache.org/t/compute-graph-pipeline/8957


>PR https://github.com/apache/tvm/pull/7892

# **Split relay graph into Subgraph then doing Subgraph Pipeline :RFC**

In this RFC, we present a new framework for subgraph pipelining that first 
split relay graph into multiple subgraph/group operator s then doing 
subgraph/group operators scheduling , this frame work would include 3 parts, 
the logic is self contained , PR submitted as 
https://github.com/apache/tvm/pull/7892

#1. Relay graph splitting.

all the logic in this function 
[analysis.py](https://github.com/apache/tvm/pull/7892/files#diff-b5e426a9ad45cd945966b27c224b279fdc04cb4f24b9e17ca53ef9660c1e39c9):pipeline_graph,
 this function input is a "relay graph" and an integer array to represent

the subgraph split logic and the output is multiple "relay graph " as 
"subgraph" that can get use for future pipeline scheduling.

#2. Runtime/Executor and Interface

[subgraph_executor.cc](https://github.com/apache/tvm/pull/7892/files#diff-29d8c7c41685448cf99fcf59ed2bcfba05e04413392a5f0971b96a26ff95e633)

[subgraph_executor.](https://github.com/apache/tvm/pull/7892/files#diff-29d8c7c41685448cf99fcf59ed2bcfba05e04413392a5f0971b96a26ff95e633)h
 in folder 
[src/runtime/subgraph/](https://github.com/apache/tvm/pull/7892/files#diff-29d8c7c41685448cf99fcf59ed2bcfba05e04413392a5f0971b96a26ff95e633)

and

subgraph_executor.py, are the subgraph executor/runtime logic, that implement 
the interface to interact with frontend part logic to

"create subgraph executor","set input", "run" "get output " and "stop"

#3. Scheduling and Memory movement

other file except runtime/executor files mentioned in #2, implement the 
"pipeline scheduling", "memory movement" and "Syncolization"

logic.

Before this RFC, I have an Initial version RFC that just discuss/implemented 
the "Relay graph splitting" and as we discussed, that seems like not

be self-contained, then I create this new RFC with subgraph runtime/executor 
support to make whole logic be self contained and review friendly.

we also have some discussion about integrate subgraph into BYOC to avoid 
redundant code, integrate into BYOC is convenience for these backend

that use BYOC mechanism, but such pipeline logic would not available for these 
backend that not go through BYOC, for example VTA, for such reason, hence we 
keep the subgraph split logic as a independent function to make this pipeline 
feature both available for BYOC backend and non-BYOC backend.

## **Motivation for subgraph pipeline**

currently a lot of edge device use for inference is SOC, the idea scenario is 
we can use multiple chipset for compute together and parallel, one existing 
solution is to do batch processing on different chipset, for example processing 
image1 on cpu, image2 on gpu, image3 on FPGA,

but one issue is different chipset have different latency , slow chipset like 
cpu may be 10+ time slow then fast chipset like FPGA, when use whole compute 
graph as a schedule unit would involve huge latency for processing even we may 
get a better throughput. hence we need smaller schedule unit like subgraph as 
the schedule unit to more efficient to use SOC heterogenous hardware resource 
and reach better performance.

another benefit of subgraph pipeline is to provide the capability to use more 
backend parallel for inference, with the RPC help the backend

joined into the subgraph pipeline can even locate remotely on cloud.

in our use case we start to think about subgraph pipeline feature when we 
cooperate with autoware to use VTA as opensource vision control unit solution 
on ultra96 board, that time we experience performance issue on image processing 
throughput, VTA use CPU to process first conv layer and FPGA+CPU for other 
layer, when CPU is doing first layer processing, the FPGA actually is IDLE, 
then we give try to split the subgraph into 2 part and doing pipe line, we get 
about 30% + performance improvement without any hardware change.

## **Background and Prior Work**

This subgraph runtime/executor relies on graph_executor for operator execution 
and storage set up, it is one more wrap up graph_executor,

## **Goals&Scope**

The goal of this new framework&runtime is to provide a solution for such 
requirement to get performance improvement by doing operator/subgraph level 
scheduling. In this initial version, what we developed include

#1. a function to split compute grah

#2. a tiny runtime/executor,

Some features may useful but not available in this version are

#1. the subgraph split logic coming from user define, the automatic split logic 
generating or optimizing not there yet

#2. There is no CPU affinity configuration for these threads that running 
Pipeline control logic in this version

## **Architecture of solution**

In this RFC and PR, we split our work into 3 parts, from top to down is as 
following

![](https://www.evernote.com/shard/s201/res/0e51c8d8-9bcd-add1-4c03-18c9e114917f)

### **Module 1: Relay graph split logic <Analysis.py:Pipeline_graph>**

Function of module 1 is to split relay graph into a group of relay subgraph, 
following is an example for a simple network split

![](https://www.evernote.com/shard/s201/res/e75b76d3-393d-bfe6-797c-7dd678aa8480)

there are 2 input for this function No1 "Relay graph", No 2 "Split array" , the 
output is Array of "module", this function did 2 mainly work, first is to split 
one relay expression into multiple independent

expression, secondly rebuild the meta data reference to make sure new 
expression can find meta correctly. the detailed logic can get find in 
pipeline_graph function.

**Module 2: Subgraph runtime/executor[Interface]**

After split a Relay graph into a group of subgraph, user can build these 
subgraph with any backend they prefer, then subgraph runtime would do the job 
to scheduling these module/library in

pipeline model.

Subgraph runtime can get split into 2 module, One is the interface part that 
response to interact with caller to set data/parameter/modules that need to get 
schedule and doing the initialization/

run/stop work, Second part is the scheduling logic that response to pipeline 
the subgraph and transfer data between different subgraph, here we only talking 
about the Interface part.

Subgraph runtime[Interface], have 3 files subgraph_executor.py, 
[subgraph_executor.cc](https://github.com/apache/tvm/pull/7892/files#diff-29d8c7c41685448cf99fcf59ed2bcfba05e04413392a5f0971b96a26ff95e633)
 , 
[subgraph_executor.](https://github.com/apache/tvm/pull/7892/files#diff-29d8c7c41685448cf99fcf59ed2bcfba05e04413392a5f0971b96a26ff95e633)h,

**subgraph_executor.py** is first interface that use to receive data and 
parameters for subgraph_runtime creation, internally it calling Cython function 
that exposed by subgraph_executor.cc/.h to create "SubGraphRuntime" instance in 
c++ part, then go through these functions like "SetInput", "GetOutput" that 
provided by "SubGraphRuntime" to set input and get output from C++ 
part(subgraph_executor.cc/.h).

Another work did in subgraph_executor.py is that it would reate 
"graph_executor" module for every subgraph module, then the "module" 
information of "graph_executor" would send into c++ part as

the scheduling unit for pipeline logic

**[subgraph_executor.cc](https://github.com/apache/tvm/pull/7892/files#diff-29d8c7c41685448cf99fcf59ed2bcfba05e04413392a5f0971b96a26ff95e633)/.h**
 is the c++ part logic that interact with both caller and lower level 
scheduling logic, there is a class named "SubGraphRuntime" that use to do 
control level scheduling logic,

like run/stop etc, it also provide the storage service to storage variable that 
use for scheduling work.

What this part logic did as following

![](https://www.evernote.com/shard/s201/res/43d013f8-a5bb-efa7-5a1e-61be98932b36)

**Module 3: Subgraph runtime/executor[Scheduling logic]**

Scheduling logic consist of four files, "subgraph_function.cc/.h", 
"subgraph_data.h", "subgraph_struct.h" this modules doing following work.

#1. Maintain a **pipeline thread** pool

#2. **data forwarding** between different subgraph

#3. cross device **data copy**

#4. set **input data** for different subgraph

#5. polling **output** data

#6. **synchronization** logic for data arrive notification

Following are some diagram how these files cooperate

**Subgraph create**

![](https://www.evernote.com/shard/s201/res/f9c4ad7d-7fca-c443-be74-127bedf32b8b)

**Subgraph run**

![](https://www.evernote.com/shard/s201/res/d332a72d-2833-b7d0-275f-8858193e230a)





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-compute-graph-pipeline-with-new-subgraph-executor/9839/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/8f7b84389c659a54767d4cd778b0d2105ebaa4a2749826941a465e743abc8df8).

[Apache TVM Discuss] [Development] [RFC] Compute graph pipeline with new subgraph executor

Reply via email to