yaoyaoding commented on issue #292: URL: https://github.com/apache/tvm-ffi/issues/292#issuecomment-3594338242
Thanks @oraluben for the proposal and willingness for contribution! **Runtime API vs Driver API** I also feel it's good to switch from runtime api to driver api to get rid of the dependency over cuda runtime. It's suggested to use the modern [library management APIs](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__LIBRARY.html#group__CUDA__LIBRARY) over the module management APIs. Since it decouples the kernel with cuda context and it's easier to support multi-gpu case. [This version](https://github.com/apache/tvm-ffi/pull/283/files/b12edd1c035b8f32d7750258233e7db15f59cf6b#diff-328af6a8e19e7712dd600bd593241491af8542003456c96744958f8e0114203a) used the library api and can be a reference. **Embed cubin** Currently, we use some link tool to embed the cubin into an object file. @oraluben suggests to use `bin2c` in the proposal. To give others the context, CUDA toolkit also provides `bin2c` to translate a cubin into a c source with cubin as a buffer. ```cuda // a.cu content extern "C" __global__ void AddOneKernel(float* x) { int idx = blockIdx.x * blockDim.x + threadIdx.x; x[idx] += 1; } extern "C" __global__ void AddOneKernel_AA(float* x) { int idx = blockIdx.x * blockDim.x + threadIdx.x; x[idx] += 1; } ``` ```sh $ bin2c a.cu --name my_kernel ``` ```c // output of bin2c #ifdef __cplusplus extern "C" { #endif unsigned char my_kernel[] = { 0x65,0x78,0x74, ... }; #ifdef __cplusplus } #endif ``` I was not aware of such tool, and now I prefer the `bin2c` than the current way to perform the embedding than the current method (using link stage tools to perform the embedding). **Interface** We might need more discussion over the design of the interface. This is the current definition of `TVM_FFI_EMBED_CUBIN` ```c++ #define TVM_FFI_EMBED_CUBIN(name) \ extern "C" const char __tvm_ffi__cubin_##name[]; \ extern "C" const char __tvm_ffi__cubin_##name##_end[]; \ namespace { \ struct EmbedCubinModule_##name { \ tvm::ffi::CubinModule mod{__tvm_ffi__cubin_##name}; \ static EmbedCubinModule_##name* Global() { \ static EmbedCubinModule_##name inst; \ return &inst; \ } \ }; \ } /* anonymous namespace */ ``` We can change it to something like ```c++ #define TVM_FFI_EMBED_CUBIN(name) \ extern "C" const unsigned char __tvm_ffi__cubin_##name[]; \ namespace { \ struct EmbedCubinModule_##name { \ tvm::ffi::CubinModule mod{__tvm_ffi__cubin_##name}; \ static EmbedCubinModule_##name* Global() { \ static EmbedCubinModule_##name inst; \ return &inst; \ } \ }; \ } /* anonymous namespace */ ``` (just remove the `_end` symbol). The workflow looks like: - use `bin2c` to generate `cubin.c` with name `__tvm_ffi__cubin_##name` given a cubin binary. - compile `cubin.c` to `cubin.o` - link the user's object with cubin.o - localize the cubin symbol We have defined some python/cmake utilties in [here](https://github.com/apache/tvm-ffi/blob/main/python/tvm_ffi/utils/embed_cubin.py) and [here](https://github.com/apache/tvm-ffi/blob/main/cmake/Utils/EmbedCubin.cmake) that might need to be updated. @oraluben @tqchen happy to know your opinion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
