I am new to TVM and learning TVM source code. I'm interesting in GPU kernel generation and kernel loading. GPU kernel will generate and be compiled to ptx when tvm.build() is executing. Then TVM calls cuda API cuModuleLoadData () cuModuleGetFunction() and cuLaunchKernel() to load and launch the generated kernel on runtime. I think there is compiling operation on runtime, ptx will be turn into the executable code. Compiling maybe execute many times when multi-process launch the same generated kernel. I am confused about whether compiling at runtime will degrade performance?
--- [Visit Topic](http://tracking.discuss.tvm.ai/tracking/click?d=0I4aPocPxa58x-IVtKSsItdNIlQnLHQRVFpp8_QgpLv6PBR6-Qbc9Z6sezySzzX_GLLdrix3Xp765KtwrjnhPrUuNeLUBSkwQkw-dDwGbOYWrSHyk4azNYxSdCu_BuHhyeO2wjKOtCj8FnRfRZ8HrdeSlD7pO3XIWJisD__vnGzavfgb-WmCvjahc4wX9ERa46c3rVE39T9e3HrQGi3Bens1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](http://tracking.discuss.tvm.ai/tracking/click?d=7cFgOaAA4XIBVlVKt_oyC07uihTjg4Q6cjeBRNRTiPrwDFi5fB2CwpzBSwsCf0l9m-dx2agM_1YvA4lBXUqbfLamRS_rt3Y_DSJoXm91b3mhBFmxLfBYep1RWy4LvDjNzDZwe5EnT9EDpAaqRXKdJaJro4fqPBWgRlYmchJS3ghh8wKZgA-NMMV2B9JUJOkzF1mveL6mE5X_bYHZFlAaDQjNcAzQdsPHv-3xx8aLJLQb0). Tianqi Chen, UW, Seattle, WA, 98105, United States http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=c5d177kKFc-0RIE8xXRlIg2