Hi!

On Tue, 10 Feb 2015 11:20:24 +0100, Jakub Jelinek <ja...@redhat.com> wrote:
> On Tue, Feb 10, 2015 at 11:16:22AM +0100, Martin Jambor wrote:
> > On Mon, Jan 12, 2015 at 12:22:44AM +0300, Ilya Verbin wrote:
> > > Currently if a target* pragma appears within a target region, GCC 
> > > successfully
> > > compiles such code (with a warning).  But the binary fails at run-time, 
> > > since it
> > > tries to call GOMP_target* functions on target.
> > > 
> > > The spec says: "If a target, target update, or target data construct 
> > > appears
> > > within a target region then the behavior is unspecified."
> > > 
> > > I see 2 options to make the behavior more user-friendly:
> > > 1. To return an error at compile-time.
> > > 2. To check at run-time in libgomp whether GOMP_target* is called on 
> > > target, and
> > > perform target-fallback if so.
> > > 
> > 
> > What actually happens when an accelerator calls a libgomp function?
> > Is a target libgomp port invoked?  If so, it should easily know it
> > runs on a target even without a run-time check, I suppose.  Or do you
> > somehow bring that call back to the host?
> 
> The spec says that it is undefined behavior to invoke
> #pragma omp target {,data,update} from within #pragma omp target region.

We're not currently implementing that, but let me mention that OpenACC
describes a concept of nested parallelism:

OpenACC 2.0a, 1.2 Execution Model:

    [...]
    On some devices, the accelerator may also create and launch parallel 
kernels, allowing for
    nested parallelism. In that case, the OpenACC directives may be executed by 
a host thread or
    an accelerator thread. [...]

OpenACC 2.0a, 2.6 Data Environment:

    [...] When a
    nested OpenACC construct is executed on the device, the default target 
device for that
    construct is the same device on which the encountering accelerator thread 
is executing. In
    that case, the target device shares memory with the encountering thread.

For PTX, this would use CUDA's Dynamic Parallelism,
<http://devblogs.nvidia.com/parallelforall/introduction-cuda-dynamic-parallelism/>,
for example.

> For intelmic, the offloading shared libraries are normally linked against
> -lgomp and thus can call any functions from there.
> For nvptx, libgomp still needs to be ported to that target.
> So, what we can do is e.g. ignore the nested #pragma omp target* regions
> inside of #pragma omp target, or turn them into __builtin_trap ().


Grüße,
 Thomas

Attachment: pgpZwVVEtESy0.pgp
Description: PGP signature

Reply via email to