# Bring your own codegen to TVM + Graph Partitioning

The goal is to come up with a right Relay subgraph data structure/abstraction 
so that we can more conveniently allow thrid-party library and hardware vendors 
to bring their own codegen tools to TVM.

This RFC involves design and implementation in the following aspects at least.

* Graph coloring
  * Providing HW vendors an infra to customize where they want to execute an op.
* Graph partitioning
  * A Relay pass that partitions a program into segments that could be executed 
on various hardware platforms.
* Code generation
  * Generate code for each segment of a partition Relay program.
* Artifact serialization
  * Provide functionality to support save/load of the compiled artifacts.
* Runtime
  * Integrate other runtimes/execution engines or invoke the external library 
code/subgraph through both graphruntime and VM (the current POC implementation 
is using VM).

### Model Coverage

* CNN: MLP, VGG, ResNet, SqueezeNet, Inception V3, etc.
* CV: SSD with ResNet 50, MobileNet, VGG-16, etc.
* NLP models are not supported well yet in Relay so we will revisit them in the 
future.
* And more...

### Coloring - Group nodes with the annotation to a minimum Number of subgraphs.

* Problem Formulation
  * Input
    * Given a Relay graph with extern op annotations (added by users or by some 
internal mechanisms).
    * The I/O of each node (op) may or may not have annotations to indicate if 
this node is suggested to be offloaded.
    
  * Output
    * A graph with minimum annotations on edges indicating the boundary of 
subgraphs.
  
* Implementation 1: Op-level annotation
  * For each op, we have a corresponding check function registered and the 
checker will be invoked at the compilation time to indicate if we should 
annotate the op for the 3rd party accelerator to offload. For example, the 
following shows a checker of `conv2d` :
    * `@reg.register_extern_op("nn.conv2d")def conv2d(attrs, args, comp):    
return get_extern_op(comp, "conv2d")(attrs, args)`
    * Note that `comp` is a string to represent the 3rd party compiler name; 
the `get_extern_op` uses `hasattr` and `getattr` to obtain the 3rd party 
specified checkers.
  * For HW partners/3rd party library, they only need to implement simply 
checker functions for each op to specify if they could support an op under 
certain conditions. The following example shows a case that the accelerator 
only supports `conv2d` with floating types.
    * `def conv2d(attrs, args):    type = args[0].output_type_.dtype    return 
(type == 'float32' or type == 'float64')`
    * Note that HW partners do not need to register this function but just need 
to implement it under Relay backend/contrib/compiler_name so that the function 
can be discovered and imported dynamically.
  * A Relay IR pass in Python will invoke above function, insert annotations to 
the graph, and run Algorithm 1 for coloring.
* Implementation 2: Subgraph-level annotation
  * We also provide an option for HW partners to annotate the graph directly. 
In this case, they have to implement a Relay IR pass with a use of our APIs to 
annotate boundary annotations (i.e., `subgraph_start` and `subgraph_end` ).

### Partitioning - Check/Validate the graph and process graph I/Os.

* Problem Formulation
  * Input
    * Given a Relay program with boundary annotations (i.e., `subgraph_start` 
and `subgraph_end` ).
    * The boundary annotations can be added by the coloring stage. In this 
case, the boundary annotations are always valid.
    * Users can directly add boundary annotations to their Relay programs. In 
this case, we need to validate the annotations before partitioning.
  * Output
    * The updated Relay program with subgraphs replaced with sub functions. All 
annotations should be removed and calls should be inserted to invoked the sub 
functions.
    
### Codegen - To tell the Relay backend to use external codegen instead of TVM.

* Invoke different codegen tools from TVM directly. **This needs HW partners to 
register their codegen tool to TVM as a runtime module.**
* During compilation, we can traverse the graph and check the attributes of 
different subgraphs. For example, an external codegen tool has to be invoked 
once we found that the attribute of subgraph is annotated with an external 
compiler. For the example above, we can generate a runtime module for 1x1 conv, 
but we have to invoke external compilers to generate code for the two subgraphs.
  * How to register?
    * **HW vendors need to register their compiler as a runtime module and at 
least be able to deal with the following tasks**
      * Ingest a Relay function/module and compile it.
      * Ingest TVM input data structures, e.g. **NDArray.** TVM feeds data in 
the NDArray format to the subgraph and expects the external accelerator to 
execute it and return the output in the **NDArray** as well. Therefore, HW 
vendors will need consider the conversion of TVM data to whatever data that is 
compatible to their compiler.
      * Implement the virtual functions of a `runtime::ModuleNode` , i.e. 
`SaveToFile` , `SaveToBinary` , `GetSource` , `GetFunction` , etc. 
`GetFunction` is particular important because that’s how we could get the 
function ptr of a subgraph and invoke it during runtime. An example for the 
registration of CUDA runtime module is here: 
http://tracking.discuss.tvm.ai/tracking/click?d=S2hLLKhuOQAF6f2PuXGBd6L9G10Fye2-X-L0Ooh4cX2XQkMwNrM562pOXIDFp6Kh2Pn-3_o3Am-YYHDANyVvo6z8X7KcRkfCvBQIbKWROmB8fvunmtNQbn0HDwg2WjcHkjPpFxYulb9mJaH8ZotAxYJFeTk46ewEBHBP1Hwbj9-J0nXPJtEawwqjumSxOk_0eQ2
  * What APIs we need to expose?
    * The major APIs would be similar to other codegen tools that currently 
baked into TVM, i.e. LLVM and CUDA, etc.

### Serialization - Save the subgraphs and load them back

  * TVM serializes the built artifact into json, params, and library. What do 
the subgraphs bring us? Each HW vendor has their own artifacts. For example, 
they may encode the structure of the subgraph into the library, they may need 
and even modify the params.
  * Serialize the partitioned subgraphs into a form to save on disk.
  * Need to let HW partners know what ops are in the subgraph? We should treat 
a subgraph as a black box, but just feed it with input data and expect to get 
the correct output from external libraries.
  * How many libraries? We may generate multiple libraries one for each backend.
    * How to load multiple libraries and let the subgraph invoke the correct 
library?
    * Can we combine them into a fat library if the external codegen tool is 
registered to TVM as a runtime module?

### Runtime - Invoke different subgraphs using different libraries

  * Graph runtime and VM runtime.
  * Offload a subgraph to the 3rd party library
  * How to invoke the library and let it take control of the subgraph?
  * Two cases
    * HW vendors have their own runtime.
      * How to coordinate two runtimes?
    * HW vendors don’t have their runtime.
      * Only use TVM runtime. We still need the library generated by the 
external compiler to be able to ingest TVM runtime data and finish the 
execution of a subgraph.

We have an initial implementation here: 
http://tracking.discuss.tvm.ai/tracking/click?d=EXYpMKlXjzU9_LenuwG50I7mn9DGv6igv_REgP752D7W_58-S01b6cPIvAJThFof_IX7UG30_0SzQ2__d5BM8__K7x_JZdSBEX7j4fkZbR_1tn6ef8hDNOFilzAQGCY-Q07ZwUwViFvsWYagyWMtqzk1,
 where we provided support for MKLDNN using DNNL execution enigne and a simple 
experimental version to allow GCC to ingest NDArray and compile a simple graph. 
Thanks @jroesch for providing many suggestions. Also part of credits should go 
to @comaniac for working  together.

Any comments and thoughts are welcome:)

@tqchen  @wweic @haichen @thierry @ajtulloch @jonso @janimesh  @ciphr





---
[Visit 
Topic](http://tracking.discuss.tvm.ai/tracking/click?d=kGbYQZivdwgHoPpA1QEO9UZCR1lmlm0OjMW1sSy9aAkEl3jZTW2mHHVknxBkHNBy3wH0EVTH2b2rCLaHruh8pFHetdeLn0hK0DUqGoPtQdXV0U6T7aOpdA0VkyZ67K4AXlAofXkvZ6IUeFs8hlMnAGrgJwp2PzkePfRN_wL3wq8r0)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](http://tracking.discuss.tvm.ai/tracking/click?d=7cFgOaAA4XIBVlVKt_oyC07uihTjg4Q6cjeBRNRTiPo_y4GoI8yDnOvuDoRge4viX2lcgiGYf6t6LeO0odIFQ5VNM822YabKAUFQNVkskBznGwm_nFYWUPzUNDfGEPwl2WUjS96pvb_5O0VQsdwvRrmVQ88Ymloy2gmHAK2_wWsymn9ZC0WVlOadeXgZBlNoq5Sl9CM6cYOajwQ2rQsGwPAXGoPwAMBETkONUbZmJLXp0).

Tianqi Chen, UW, Seattle, WA, 98105, United States
http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=ZU3cuuJ-TL5YZv70jr8pxg2

Reply via email to