On 10/20/2015 08:34 PM, Alexander Monakov wrote:
The approach I've taken in libgomp/nvptx is to have a single entry point, gomp_nvptx_main, that can take care of initial allocation, transferring control to target region function, and finalization.At the moment it has the prototype: void gomp_nvptx_main(void (*fn)(void*), void *fndata); but it's plausible that down the road we'll need other arguments for passing data allocated by the plugin. I see two possible ways to arrange that. 1. Make gomp_nvptx_main a .kernel function. This is what this patch assumes. This requires emitting pointers-to-target-region-functions from the compiler, and looking them up via cuModuleLoadGlobal/cuMemcpyDtoH in the plugin. 2. Make gomp_nvptx_main a device (.func) function. To have that work, we'd need to additionally emit a "trampoline" of sorts in the NVPTX backend. For each OpenMP target entrypoint foo$_omp_fn$0, we'd have to additionally emit __global__ void foo$_omp_fn$0$entry(void *args) { gomp_nvptx_main(foo$_omp_fn$0, args); }
Wouldn't it be simpler to generate a .kernel for every target region function (as OpenACC does)? That could be a small stub in each case which just calls gomp_nvptx_main with the right function pointer. We already have the machinery to look up the right kernel corresponding to a host address and invoke it, so I think we should just reuse that functionality.
Bernd
