I do not disagree. The sticky pt is how do we categorize the "host driving"(memory allocation, kernel launch parameter computation) part of the target program.
We do not intent to categorize arbitrary CPU + GPU program as "gpu program". Under V0, a device target(with target host) program can include: - Host code that calls into device workspace allocation. - Launch of device kernels It cannot include: - Write/Read to memory related to another device(including the host CPU). To give some examples: ### E0 This is not a "gpu program", because it contains a cuda part and cpu part. ``` fn cpugpumix() { // cpu part a = cpu_malloc(size); for i: a[i] = ... b = cuda_malloc(size) memcpy_to_cuda(b, a) // cuda part launch cuda kernel 1 { } launch cuda kernel 2 { } } ``` ### E1 This is a gpu program ``` fn () { // cuda part b = alloc("global", size) launch cuda kernel 1 { } launch cuda kernel 2 { } } ``` ## E2 ``` fn multidevice() { // cuda program starts a = alloc(size, scope="global") launch kernel 1{ a[threadIdx.x] = ... } // cpu region memcpy_cuda_to_cpu(mcpu, a) memcpy_cuda_to_cpu(b, mcpu) // dsp program region launch dsp kernel { } } ``` ## Discussions In the case of E1, one can imagine that we can completely move all the code, including workspace allocation into the device itself(this is what can happen in the DSP case). That is why it can be categorized as a single target gpu program. So from the programming model's PoV, it would be useful to annotate the region using a non composite target. The reality is that the `alloc` and kernel launch parameter calculation happens on the host. The target host is a useful way to specify how to generate the host support code for this kind of kernels. --- [Visit Topic](https://discuss.tvm.ai/t/rfc-tvm-target-specification/6844/16) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/b944e438de298641f55fe03bb68bb4f131ab11522e3f8bacbd30dfe0f8d3e602).