   The module is doing *Linear* transform, and i dispatch it to pre-scheduled op in MLC-LLM dolly-v2-3b. I profile it by using Nsight Compute run the script above. As a result of Nsight Compute,there are 5 kernels named `fused_NT_matmul1_add3` generated in the profile report. But in the script, i just do inference once, and the prim_func `fused_NT_matmul1_add3` is only called once in relax_func `smallLernels`.Intuitively there should only be one cuda kernel. It is quite confusing, could anyone help to explain why? --- [Visit Topic](https://discuss.tvm.apache.org/t/unity-confuse-about-cuda-kernel-codegen/15205/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/f27c704435b52ed6bdb6a994e87c4d1026fd8094d62cee1a05110b10c2053893).
[Apache TVM Discuss] [Questions] [unity] confuse about cuda kernel codegen
TaoWei via Apache TVM Discuss Wed, 28 Jun 2023 02:35:30 -0700