On Thu, 15 Oct 2015, Jakub Jelinek wrote:
> Looking at Cuda, for async target region kernels we'd probably use
> a non-default stream and enqueue the async kernel in there. I see
> we can e.g. cudaEventRecord into the stream and then either cudaEventQuery
> to busy poll the event, or cudaEventSynchronize to block until the event
> occurs, plus there is cudaStreamWaitEvent that perhaps might be even used to
> resolve the above mentioned mapping/unmapping async issues for Cuda
> - like add an event after the mapping operations that the other target tasks
> could wait for if they see any in_flux stuff, and wait for an event etc.
> I don't see a possibility to have something like a callback on stream
> completion though, so it has to be handled with polling.
Not sure why you say so. There's cu[da]StreamAddCallback, which exists
exactly for registering completion callback, but there are restrictions:
- this functionality doesn't currently work through CUDA MPS ("multi-process
server", for funneling CUDA calls from different processes through a
single "server" process, avoiding context-switch overhead on the device,
sometimes used for CUDA-with-MPI applications);
- it is explicitely forbidden to invoke CUDA API calls from the callback;
perhaps understandable, as the callback may be running in a signal-handler
context (unlikely), or, more plausibly, in a different thread than the one
that registered the callback.
Ideally we'd queue all accelerator work up front via
EventRecord/StreamWaitEvent, and not rely on callbacks. If host-side work
must be done on completion, we could spawn a helper thread waiting on
cudaEventSynchronize.
> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/target-tmp.c
> > @@ -0,0 +1,40 @@
> > +#include <stdio.h>
> > +#include <unistd.h>
> > +
> > +#pragma omp declare target
> > +void foo (int n)
> > +{
> > + printf ("Start tgt %d\n", n);
> > + usleep (5000000);
>
> 5s is too long. Not to mention that not sure if PTX can do printf
> and especially usleep.
printf is available, usleep is not (but presumably use of usleep needs to be
revised anyway)
Alexander