On Thu, 15 Oct 2015, Jakub Jelinek wrote:
> Looking at Cuda, for async target region kernels we'd probably use
> a non-default stream and enqueue the async kernel in there.  I see
> we can e.g. cudaEventRecord into the stream and then either cudaEventQuery
> to busy poll the event, or cudaEventSynchronize to block until the event
> occurs, plus there is cudaStreamWaitEvent that perhaps might be even used to
> resolve the above mentioned mapping/unmapping async issues for Cuda
> - like add an event after the mapping operations that the other target tasks
> could wait for if they see any in_flux stuff, and wait for an event etc.
> I don't see a possibility to have something like a callback on stream
> completion though, so it has to be handled with polling.

Not sure why you say so.  There's cu[da]StreamAddCallback, which exists
exactly for registering completion callback, but there are restrictions:

  - this functionality doesn't currently work through CUDA MPS ("multi-process
    server", for funneling CUDA calls from different processes through a
    single "server" process, avoiding context-switch overhead on the device,
    sometimes used for CUDA-with-MPI applications);

  - it is explicitely forbidden to invoke CUDA API calls from the callback;
    perhaps understandable, as the callback may be running in a signal-handler
    context (unlikely), or, more plausibly, in a different thread than the one
    that registered the callback.

Ideally we'd queue all accelerator work up front via
EventRecord/StreamWaitEvent, and not rely on callbacks.  If host-side work
must be done on completion, we could spawn a helper thread waiting on
cudaEventSynchronize.

> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/target-tmp.c
> > @@ -0,0 +1,40 @@
> > +#include <stdio.h>
> > +#include <unistd.h>
> > +
> > +#pragma omp declare target
> > +void foo (int n)
> > +{
> > +  printf ("Start tgt %d\n", n);
> > +  usleep (5000000);
> 
> 5s is too long.  Not to mention that not sure if PTX can do printf
> and especially usleep.

printf is available, usleep is not (but presumably use of usleep needs to be
revised anyway)

Alexander

Reply via email to