On 10/20/2015 08:34 PM, Alexander Monakov wrote:
The approach I've taken in libgomp/nvptx is to have a single entry point,
gomp_nvptx_main, that can take care of initial allocation, transferring
control to target region function, and finalization.

At the moment it has the prototype:
void gomp_nvptx_main(void (*fn)(void*), void *fndata);

but it's plausible that down the road we'll need other arguments for passing
data allocated by the plugin.

I see two possible ways to arrange that.

1.  Make gomp_nvptx_main a .kernel function.  This is what this patch assumes.
This requires emitting pointers-to-target-region-functions from the compiler,
and looking them up via cuModuleLoadGlobal/cuMemcpyDtoH in the plugin.

2.  Make gomp_nvptx_main a device (.func) function.  To have that work, we'd
need to additionally emit a "trampoline" of sorts in the NVPTX backend.  For
each OpenMP target entrypoint foo$_omp_fn$0, we'd have to additionally emit

__global__ void foo$_omp_fn$0$entry(void *args)
{
    gomp_nvptx_main(foo$_omp_fn$0, args);
}

Wouldn't it be simpler to generate a .kernel for every target region function (as OpenACC does)? That could be a small stub in each case which just calls gomp_nvptx_main with the right function pointer. We already have the machinery to look up the right kernel corresponding to a host address and invoke it, so I think we should just reuse that functionality.


Bernd

Reply via email to