![1|690x265, 100%](upload://ggfLLx4q3Ra9yUL8sApvNG503NO.png) ![2|690x359, 
75%](upload://hqovdu5MuxFvjsn6sCdvMTh4y3l.png) 

![3|690x53](upload://7sy5FdLPkT17u32r8OAyVPmsziu.png) 

The module is doing *Linear* transform, and i dispatch it to pre-scheduled op 
in MLC-LLM dolly-v2-3b.

I profile it by using Nsight Compute run the script above.
As a result of Nsight Compute,there are 5 kernels named `fused_NT_matmul1_add3` 
generated in the profile report. 
But in the script, i just do inference once, and the prim_func 
`fused_NT_matmul1_add3` is only called once in relax_func 
`smallLernels`.Intuitively there should only be one cuda kernel.
It is quite confusing, could anyone help to explain why?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/unity-confuse-about-cuda-kernel-codegen/15205/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/f27c704435b52ed6bdb6a994e87c4d1026fd8094d62cee1a05110b10c2053893).

Reply via email to