[Apache TVM Discuss] [Development] [BYOC, CUTLASS] Dealing with Constants in C source-gen based BYOC

Cody H. Yu via Apache TVM Discuss Mon, 01 Nov 2021 13:20:25 -0700


Yeah I can see the difficulty you mentioned, and it might be possible that nvcc 
is not available in runtime if the model is deployed to an edge device.

A combined approach would be leveraging the third BYOC option: custom
codegen/runtime. Specifically, we still generate the C/CUDA kernel and compile
them using NVCC at the compile time, but instead of using the C source module
you're currently using, we treat the generated/compiled kernels as "graphs".
Meanwhile, we also serialize the constants to a JSON file. Thus, our artifacts
include compiled kernels (in binary) and constants (in JSON). This is sort of
similar to Xilinx Vitis-AI and Arm Ethos-N backends, which generate a
binary/bit-stream in the desired format, and use their own runtime for
execution.

In addition, we make a runtime engine that loads the compiled kernels and
deserializes the constants. In this way, the runtime could still be
light-weight and should be easy to implement, because all it needs to do is
invoking the corresponding kernel by its symbol and feeding the right data
entries. We don't need to have a JSON interpreter to traverse the JSON subgraph
and generate the engine like TensorRT.

btw, I'm also curios how @Laurawly deals with the specialized weight layout
with the C codegen.

---
[Visit
Topic](https://discuss.tvm.apache.org/t/byoc-cutlass-dealing-with-constants-in-c-source-gen-based-byoc/11362/5)
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/85ced833e53c1204f249728b866f85930825b9c8862d36ef3e321aaf42ce5e1c).

[Apache TVM Discuss] [Development] [BYOC, CUTLASS] Dealing with Constants in C source-gen based BYOC

Reply via email to