On 10/20/2015 08:34 PM, Alexander Monakov wrote:
The approach I've taken in libgomp/nvptx is to have a single entry point,
gomp_nvptx_main, that can take care of initial allocation, transferring
control to target region function, and finalization.
At the moment it has the prototype:
void gomp_nvptx_main(void (*fn)(void*), void *fndata);
but it's plausible that down the road we'll need other arguments for passing
data allocated by the plugin.
I see two possible ways to arrange that.
1. Make gomp_nvptx_main a .kernel function. This is what this patch assumes.
This requires emitting pointers-to-target-region-functions from the compiler,
and looking them up via cuModuleLoadGlobal/cuMemcpyDtoH in the plugin.
2. Make gomp_nvptx_main a device (.func) function. To have that work, we'd
need to additionally emit a "trampoline" of sorts in the NVPTX backend. For
each OpenMP target entrypoint foo$_omp_fn$0, we'd have to additionally emit
__global__ void foo$_omp_fn$0$entry(void *args)
{
gomp_nvptx_main(foo$_omp_fn$0, args);
}
Wouldn't it be simpler to generate a .kernel for every target region
function (as OpenACC does)? That could be a small stub in each case
which just calls gomp_nvptx_main with the right function pointer. We
already have the machinery to look up the right kernel corresponding to
a host address and invoke it, so I think we should just reuse that
functionality.
Bernd